Impossible to activate HugePage on AKS nodes - azure

Hi dear Stackoverflow community,
I'm struggling in HugePage activation on a AKS cluster.
I noticed that I first have to configure a nodepool with HugePage support.
The only official Azure Hugepage doc is about transparentHugePage (https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration), but I don't know if it's sufficient...
Then I know that I have to configure pod also
I wanted to rely on this (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/), but as 2) not working...
But in despite of whole things i've done, I could not make it.
If I'm following Microsoft documentation, my nodepool spawn like this:
"kubeletConfig": {
"allowedUnsafeSysctls": null,
"cpuCfsQuota": null,
"cpuCfsQuotaPeriod": null,
"cpuManagerPolicy": null,
"failSwapOn": false,
"imageGcHighThreshold": null,
"imageGcLowThreshold": null,
"topologyManagerPolicy": null
},
"linuxOsConfig": {
"swapFileSizeMb": null,
"sysctls": {
"fsAioMaxNr": null,
"fsFileMax": null,
"fsInotifyMaxUserWatches": null,
"fsNrOpen": null,
"kernelThreadsMax": null,
"netCoreNetdevMaxBacklog": null,
"netCoreOptmemMax": null,
"netCoreRmemMax": null,
"netCoreSomaxconn": null,
"netCoreWmemMax": null,
"netIpv4IpLocalPortRange": "32000 60000",
"netIpv4NeighDefaultGcThresh1": null,
"netIpv4NeighDefaultGcThresh2": null,
"netIpv4NeighDefaultGcThresh3": null,
"netIpv4TcpFinTimeout": null,
"netIpv4TcpKeepaliveProbes": null,
"netIpv4TcpKeepaliveTime": null,
"netIpv4TcpMaxSynBacklog": null,
"netIpv4TcpMaxTwBuckets": null,
"netIpv4TcpRmem": null,
"netIpv4TcpTwReuse": null,
"netIpv4TcpWmem": null,
"netIpv4TcpkeepaliveIntvl": null,
"netNetfilterNfConntrackBuckets": null,
"netNetfilterNfConntrackMax": null,
"vmMaxMapCount": null,
"vmSwappiness": null,
"vmVfsCachePressure": null
},
"transparentHugePageDefrag": "defer+madvise",
"transparentHugePageEnabled": "madvise"
But My node is still like that:
# kubectl describe nodes aks-deadpoolhp-31863567-vmss000000|grep hugepage
Capacity:
attachable-volumes-azure-disk: 16
cpu: 8
ephemeral-storage: 129901008Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32940620Ki
pods: 110
Allocatable:
attachable-volumes-azure-disk: 16
cpu: 7820m
ephemeral-storage: 119716768775
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 28440140Ki
pods: 110
My kube version is 1.16.15
I saw also that I should enable featuregate like this --feature-gates=HugePages=true (https://dev.to/dannypsnl/hugepages-on-kubernetes-5e7p) but I don't know how to make that in AKS... anyway As my node is not displaying any HugePage availability, i'm not sure it's useful for now.
I even try to recreate the aks cluster with a --kubeconfig, but everything remain the same: i cannot use HugePage...
Please I need your help again, i'm completely lost into this AKS service...

Install kubectl-node-shell on your laptop
curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
chmod +x ./kubectl-node_shell
sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell
Get the nodes you want to get inside:
kubectl get pod <YOUR_POD> -o custom-columns=CONTAINER:.spec.nodeName -n <YOUR_NAMESPACE>
If node is NONE, that means your pod is in pending state. Pick up one random node:
kubectl get pod -n <YOUR_NAMESPACE>
Get inside your node:
kubectl node-shell <NODE>
Configure Hugepage:
mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
Restart kubelet (still in the node, yes):
systemctl restart kubelet
Exit from node-shell by C-d (Ctrl + d)
Check HugePage is ON (ie. Values must not be 0)
kubectl describe node <NODE>|grep -i -e "capacity" -e "allocatable" -e "huge"
Either check you pod not in pending state anymore, or launch your helm install/kubectl apply now!

Related

How to use the DBus system in a container with docker root-less

I would like to use DBus in a container with docker in root-less mode.
I use Ubuntu 22.10 :
host$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.10
Release: 22.10
Codename: kinetic
and docker root-less :
host$ docker info
Client:
Context: rootless
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
compose: Docker Compose (Docker Inc., v2.12.2)
scan: Docker Scan (Docker Inc., v0.21.0)
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 3
Server Version: 20.10.21
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
userxattr: true
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d986545181c905378b0f90faa9c5eae3cbfa3755
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
seccomp
Profile: default
rootless
cgroupns
Kernel Version: 5.19.0-26-generic
Operating System: Ubuntu 22.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.23GiB
Name: ****************
ID: LAEG:NBQE:RME5:OPHR:TT4C:PHA3:25FE:7DPW:46PD:E2VI:6FB6:HQ2P
Docker Root Dir: /home/*******/.local/share/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
I tried to create a container with the dbus socket mounted in it :
docker run -it --rm -v /var/run/dbus:/var/run/dbus ubuntu:latest bash
In my case I need to launch the container with a user different from root. Then I created a test user with the uid 1000:
root#163974703e4c:/# adduser test
Adding user `test' ...
Adding new group `test' (1000) ...
Adding new user `test' (1000) with group `test' ...
Creating home directory `/home/test' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for test
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
I switch to this new user :
root#163974703e4c:/# su test
test#163974703e4c:/$ id
uid=1000(test) gid=1000(test) groups=1000(test)
As I have a user other than root, he has on my host a subuid. My /etc/subuid:
user:100000:65536
Therefore I put an acl on my dbus socket to allow my sub user to use the socket:
host$ sudo setfacl -R -m u:100999:rwx /run/dbus/system_bus_socket
So I have the DBus socket with an access to this socket in the container:
test#163974703e4c:/$ ls -lan /run/dbus/system_bus_socket
srw-rwxrw-+ 1 65534 65534 0 Dec 9 17:46 /run/dbus/system_bus_socket
test#163974703e4c:/$ getfacl /run/dbus/system_bus_socket
getfacl: Removing leading '/' from absolute path names
# file: run/dbus/system_bus_socket
# owner: nobody
# group: nogroup
user::rw-
user:test:rwx
group::rw-
mask::rwx
other::rw-
I test the command dbus-monitor --system but I have this output :
$ dbus-monitor --system
Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Can you help me please?
I tried to launch my container in privileged mode, with --add-cap ALL, but I still get this error message.
I tried to use strace to show all system call nothing more information :
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
getresuid([1000], [1000], [1000]) = 0
getresgid([1000], [1000], [1000]) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 29) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
geteuid() = 1000
getsockname(3, {sa_family=AF_UNIX}, [128 => 2]) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "\0", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(3, "AUTH EXTERNAL 31303030\r\n", 24, MSG_NOSIGNAL, NULL, 0) = 24
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
read(3, "REJECTED EXTERNAL\r\n", 2048) = 19
close(3) = 0
write(2, "Failed to open connection to sys"..., 252Failed to open connection to system bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
) = 252
exit_group(1) = ?
+++ exited with 1 +++
I want to get the same output as on my host in my container :
dbus-monitor --system
dbus-monitor: unable to enable new-style monitoring: org.freedesktop.DBus.Error.AccessDenied: "Rejected send message, 1 matched rules; type="method_call", sender=":1.544" (uid=1000 pid=32723 comm="dbus-monitor --system" label="unconfined") interface="org.freedesktop.DBus.Monitoring" member="BecomeMonitor" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (bus)". Falling back to eavesdropping.
signal time=1670624207.443897 sender=org.freedesktop.DBus -> destination=:1.544 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
string ":1.544"
signal time=1670624214.344658 sender=:1.12 -> destination=(null destination) serial=47 path=/org/freedesktop/UDisks2/drives/ST2000DM008_2FR102_ZFL3HVF7; interface=org.freedesktop.DBus.Properties; member=PropertiesChanged
string "org.freedesktop.UDisks2.Drive.Ata"
array [
dict entry(
string "SmartUpdated"
variant uint64 1670624214
)
]
array [
]
The issue is the EXTERNAL authentication used by libdbus which leads t0 discrepancy crossing user-namespace boundaries. Described here https://bugreports.qt.io/browse/QTBUG-108408.
If you can afford to patch libdbus in your project or at least in your containers then you should be good to go by this patch.
From 0d18f455194924ffb100bc980239082187b48301 Mon Sep 1
7 00:00:00 2001
From: =?UTF-8?q?=F0=9F=98=8
Date: Sun, 13 Nov 2022 20:08:02 +0100
Subject: [PATCH] fix: Do not send UID by External Auth
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
sending the UID per EXTERNAL authentication crossing user-namespace would cause
mismatch with out-of-band credentials acquired over UDS
An empty "AUTH EXTERNAL" is still a valid implementation of EXTERNAL authentication
Upstream-ticket: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195
---
dbus/dbus-auth.c | 37 ++++++++++++++-----------------------
1 file changed, 14 insertions(+), 23 deletions(-)
diff --git a/dbus/dbus-auth.c b/dbus/dbus-auth.c
index d4faa737..1d8f3b53 100644
--- a/dbus/dbus-auth.c
+++ b/dbus/dbus-auth.c
## -1231,31 +1231,22 ## static dbus_bool_t
handle_client_initial_response_external_mech (DBusAuth *auth,
DBusString *response)
{
- /* We always append our UID as an initial response, so the server
- * doesn't have to send back an empty challenge to check whether we
- * want to specify an identity. i.e. this avoids a round trip that
- * the spec for the EXTERNAL mechanism otherwise requires.
- */
- DBusString plaintext;
-
- if (!_dbus_string_init (&plaintext))
+ /* We don't send the UID as crossing user-namespace would cause
+ mismatch with out-of-band credentials acquired over UDS
+ it is still a valid implementation of EXTERNAL authentication
+ check related tickets in sd-bus
+ https://github.com/systemd/systemd/commit/1ed4723d38cd0d1423c8fe650f90fa86007ddf55
+ and gdbus
+ https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2832
+
+ Upstream ticket for proper fix: https://gitlab.freedesktop.org/dbus/dbus/-/issues/195
+ */
+ if (!_dbus_string_append (response,
+ "\r\nDATA"))
+ {
return FALSE;
-
- if (!_dbus_append_user_from_current_process (&plaintext))
- goto failed;
-
- if (!_dbus_string_hex_encode (&plaintext, 0,
- response,
- _dbus_string_get_length (response)))
- goto failed;
-
- _dbus_string_free (&plaintext);
-
+ }
return TRUE;
-
- failed:
- _dbus_string_free (&plaintext);
- return FALSE;
}
static dbus_bool_t
--
2.38.1

Linux User NameSpaces

I am experimenting with user namespaces using Go on Linux. The thing that I cannot figure out is that although am setting the uid and gid mappings when creating the namespace it still identifies as the nobody user when I launch the binary using sudo but when I launch it using the normal user everything works fine. For reference please see my code below
...
cmd := exec.Command("/bin/sh")
cmd.Stdout = os.Stdout
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUSER,
UidMappings: []syscall.SysProcIDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 1,
},
},
GidMappings: []syscall.SysProcIDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 1,
},
},
}
cmd.Run()
....
...
From the host I can confirm that indeed the user and group mappings were successful. The current pid is 87751
sudo cat /proc/87751/uid_map
0 1000 1
sudo cat /proc/87751/gid_map
0 1000 1
But when I run the binary after building
go build -o user_n
sudo ./user_n
sh-5.0$ whoami
nobody
sh-5.0$ id
uid=65534(nobody) gid=65534(nobody) groups=65534(nobody) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
But when I run the binary using the normal user it works as expected
./user_n
sh-5.0# whoami
root
sh-5.0# id
uid=0(root) gid=0(root) groups=0(root),65534(nobody) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
While running the binary using the normal user is an option I would like to know why running using sudo does not give the expected results. Any pointers will be greatly appreciated.
More info
Fedora 31
Kernel 5.3.11-100.fc29.x86_64
go version go1.14.3 linux/amd64
In the first case, you are running as root user (through sudo) for which there is no mapping specified in the child user namespace. Hence, the resulting "nobody" id.
In the second case, you run the program as user id 1000 for which the mapping says : 1000 becomes root in the child user namespace. Hence, the resulting "root" id.

shell command with Ansible playbook doesn't work

I have added to my playbook a small task that should change umask on my linux machine:
- name: set umask to 0022
shell: umask 0022
When running the playbook, I can see this task passes successfully:
changed: [myHostName] => {
"changed": true,
"cmd": "umask 0022",
"delta": "0:00:00.004660",
"end": "2020-08-04 16:28:44.153261",
"invocation": {
"module_args": {
"_raw_params": "umask 0022",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"rc": 0,
"start": "2020-08-04 16:28:44.148601",
"stderr": "",
"stderr_lines": [],
"stdout": "",
"stdout_lines": []
}
but After the playbook finishes, I check the umask and see that it was not changed at all:
-bash-4.2$ umask
0044
I also put a debug in my playbook right after the task I showed above, and the debug also shows that the umask was not changed..
Tried also with
become: yes
But got the same result..
When I do the command on my Linux manually, it will work:
-bash-4.2$ umask 0022
-bash-4.2$ umask
0022
Q: After the playbook finishes, I check the umask and see that it was not changed at all.
A: This is correct. Ansible isn't really doing things through the shell i.e. the changes live in this one session only.

Connection refused when connecting to the exposed port on docker container

Dockerfile looks like this:
FROM ubuntu:latest
LABEL Spongebob Dockerpants "s.dockerpants#comcast.net"
RUN apt-get update -y
RUN apt-get install -y python3-pip python3-dev build-essential
#Add source files
COPY . /app
ENV HOME=/app
WORKDIR /app
# Install Python web server and dependencies
RUN pip3 install -r requirements.txt
ENV FLASK_APP=app.py
# Expose port
EXPOSE 8090
#ENTRYPOINT ["python3"]
CMD ["python3", "app.py"]
CMD tail -f /dev/null
I started the container like this:
docker run --name taskman -p 8090:8090 -d task-manager-app:latest
I see the container running, and my localhost listening on 8090:
CORP\n0118236 # a-33jxiw0rv8is5 in ~/docker_pete/flask-task-manager on master*
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c1ac5cb27698 task-manager-app:latest "/bin/sh -c 'tail -f…" About a minute ago Up About a minute 0.0.0.0:8090->8090/tcp taskman
CORP\n0118236 # a-33jxiw0rv8is5 in ~/docker_pete/flask-task-manager on master*
$ sudo netstat -nlp | grep 8090
tcp6 0 0 :::8090 :::* LISTEN 1154/docker-proxy
I tried to reach 8090 on the container via localhost per the docker run command I issued, but get 'connection refused'
CORP\n0118236 # a-33jxiw0rv8is5 in ~/docker_pete/flask-task-manager on master*
$ curl http://localhost:8090
curl: (56) Recv failure: Connection reset by peer
I then inspected the port-binding, and it looks ok:
CORP\n0118236 # a-33jxiw0rv8is5 in ~/docker_pete/flask-task-manager on master*
$ sudo docker port c1ac5cb27698 8090
0.0.0.0:8090
When I do a docker inspect , I see this:
$ docker inspect c1ac5cb27698 | grep -A 55 "NetworkSettings"
"NetworkSettings": {
"Bridge": "",
"SandboxID": "7c2249761e4f48eef373c6744161b0709f312863c94fdc17138913952be698a0",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"8090/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "8090"
}
]
},
"SandboxKey": "/var/run/docker/netns/7c2249761e4f",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "ea7552d0ba9e8f0c865fa4a0f24781811c7332a1e7473c48e88fa4dbe6e5e05d",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:02",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "cfb5be57fdeed8a08b1650b5706a00542c5249903ce33052ff3f0d3dab619675",
"EndpointID": "ea7552d0ba9e8f0c865fa4a0f24781811c7332a1e7473c48e88fa4dbe6e5e05d",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null
}
}
}
}
I am able to ping the container from my localhost:
CORP\n0118236 # a-33jxiw0rv8is5 in ~/docker_pete/flask-task-manager on master*
$ ping 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=255 time=0.045 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=255 time=0.042 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=255 time=0.047 ms
^C
--- 172.17.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.042/0.044/0.047/0.008 ms
Is there anything in the configuration that would be causing these connection refused? Is something wrong with the binding?
Your docker file contains two CMD line, but docker will only honor the latest one.
CMD ["python3", "app.py"]
CMD tail -f /dev/null
The actual command executed inside your container is the tail command, which doesn't bind and listen on the port. You can ping the container because the container is alive with the tail command.

How to start kubernetes service on NodePort outside service-node-port-range default range?

I've been trying to start kubernetes-dashboard (and eventualy other services) on a NodePort outside the default port range with little success,
here is my setup:
Cloud provider: Azure (Not azure container service)
OS: CentOS 7
here is what I have tried:
Update the host
$ yum update
Install kubeadm
$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
$ setenforce 0
$ yum install -y docker kubelet kubeadm kubectl kubernetes-cni
$ systemctl enable docker && systemctl start docker
$ systemctl enable kubelet && systemctl start kubelet
Start the cluster with kubeadm
$ kubeadm init
Allow runing containers on master node, because we have a single node cluster
$ kubectl taint nodes --all dedicated-
Install a pod network
$ kubectl apply -f https://git.io/weave-kube
Our kubernetes-dashboard Deployment (# ~/kubernetes-dashboard.yaml
# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Configuration to deploy release version of the Dashboard UI.
#
# Example usage: kubectl create -f <this_file>
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kubernetes-dashboard
template:
metadata:
labels:
app: kubernetes-dashboard
# Comment the following annotation if Dashboard must not be deployed on master
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
containers:
- name: kubernetes-dashboard
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.1
imagePullPolicy: Always
ports:
- containerPort: 9090
protocol: TCP
args:
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
# - --apiserver-host=http://my-address:port
livenessProbe:
httpGet:
path: /
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
---
kind: Service
apiVersion: v1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort
ports:
- port: 8880
targetPort: 9090
nodePort: 8880
selector:
app: kubernetes-dashboard
Create our Deployment
$ kubectl create -f ~/kubernetes-dashboard.yaml
deployment "kubernetes-dashboard" created
The Service "kubernetes-dashboard" is invalid: spec.ports[0].nodePort: Invalid value: 8880: provided port is not in the valid range. The range of valid ports is 30000-32767
I found out that to change the range of valid ports I could set service-node-port-range option on kube-apiserver to allow a different port range,
so I tried this:
$ kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
dummy-2088944543-lr2zb 1/1 Running 0 31m
etcd-test2-highr 1/1 Running 0 31m
kube-apiserver-test2-highr 1/1 Running 0 31m
kube-controller-manager-test2-highr 1/1 Running 2 31m
kube-discovery-1769846148-wmbhb 1/1 Running 0 31m
kube-dns-2924299975-8vwjm 4/4 Running 0 31m
kube-proxy-0ls9c 1/1 Running 0 31m
kube-scheduler-test2-highr 1/1 Running 2 31m
kubernetes-dashboard-3203831700-qrvdn 1/1 Running 0 22s
weave-net-m9rxh 2/2 Running 0 31m
Add "--service-node-port-range=8880-8880" to kube-apiserver-test2-highr
$ kubectl edit po kube-apiserver-test2-highr --namespace=kube-system
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "kube-apiserver",
"namespace": "kube-system",
"creationTimestamp": null,
"labels": {
"component": "kube-apiserver",
"tier": "control-plane"
}
},
"spec": {
"volumes": [
{
"name": "k8s",
"hostPath": {
"path": "/etc/kubernetes"
}
},
{
"name": "certs",
"hostPath": {
"path": "/etc/ssl/certs"
}
},
{
"name": "pki",
"hostPath": {
"path": "/etc/pki"
}
}
],
"containers": [
{
"name": "kube-apiserver",
"image": "gcr.io/google_containers/kube-apiserver-amd64:v1.5.3",
"command": [
"kube-apiserver",
"--insecure-bind-address=127.0.0.1",
"--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota",
"--service-cluster-ip-range=10.96.0.0/12",
"--service-node-port-range=8880-8880",
"--service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem",
"--client-ca-file=/etc/kubernetes/pki/ca.pem",
"--tls-cert-file=/etc/kubernetes/pki/apiserver.pem",
"--tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem",
"--token-auth-file=/etc/kubernetes/pki/tokens.csv",
"--secure-port=6443",
"--allow-privileged",
"--advertise-address=100.112.226.5",
"--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname",
"--anonymous-auth=false",
"--etcd-servers=http://127.0.0.1:2379"
],
"resources": {
"requests": {
"cpu": "250m"
}
},
"volumeMounts": [
{
"name": "k8s",
"readOnly": true,
"mountPath": "/etc/kubernetes/"
},
{
"name": "certs",
"mountPath": "/etc/ssl/certs"
},
{
"name": "pki",
"mountPath": "/etc/pki"
}
],
"livenessProbe": {
"httpGet": {
"path": "/healthz",
"port": 8080,
"host": "127.0.0.1"
},
"initialDelaySeconds": 15,
"timeoutSeconds": 15,
"failureThreshold": 8
}
}
],
"hostNetwork": true
},
"status": {}
$ :wq
The following is the truncated response
# pods "kube-apiserver-test2-highr" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
So I tried a different approach, I edited the deployment file for kube-apiserver with the same change described above
and ran the following:
$ kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.json --namespace=kube-system
And got this response:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
So now i'm stuck, how can I change the range of valid ports?
You are specifying --service-node-port-range=8880-8880 wrong. You set it to one port only, Set it to a range.
Second problem: You are setting the service to use 9090 and it's not in the range.
ports:
- port: 80
targetPort: 9090
nodePort: 9090
API Server should have a deployment too, Try to editing the port-range in the deployment itself and delete the api server pod so it gets recreated via new config.
The Service node ports range is set to infrequently-used ports for a reason. Why do you want to publish this on every node? Do you really want that?
An alternative is to expose it on a semi-random nodeport, then use a proxy pod on a known node or set of nodes to access it via hostport.
This issue:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
was caused by my port range excluding 8080, which kube-apiserver was serving on, so I could not send any updates to kubectl.
I fixed it by changing the port range to 8080-8881 and restarting the kubelet service like so:
$ service kubelet restart
Everything works as expected now.

Resources