Kubernetes worker nodes cpu and memory requests remains zero always - linux

Hi i am new to kubernetes.
1) Could not able to scaled container/pods in worker nodes. and its memory usage always remain zero. any reason ?
2) Whenever i scaled pods/container its always create in master node.
3) Is there any way to limit pod on specific nodes ?
4) How pods divide when i scaled ?
any help appropriated.
kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
kubectl describe nodes
Name: worker-node
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=worker-node
node-role.kubernetes.io/worker=worker
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 19 Feb 2019 15:03:33 +0530
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 19 Feb 2019 18:57:22 +0530 Tue, 19 Feb 2019 15:26:13 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 19 Feb 2019 18:57:22 +0530 Tue, 19 Feb 2019 15:26:23 +0530 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 19 Feb 2019 18:57:22 +0530 Tue, 19 Feb 2019 15:26:13 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 19 Feb 2019 18:57:22 +0530 Tue, 19 Feb 2019 15:26:13 +0530 KubeletReady kubelet is posting ready status. AppArmor enabled
OutOfDisk Unknown Tue, 19 Feb 2019 15:03:33 +0530 Tue, 19 Feb 2019 15:25:47 +0530 NodeStatusNeverUpdated Kubelet never posted node status.
Addresses:
InternalIP: 192.168.1.10
Hostname: worker-node
Capacity:
cpu: 4
ephemeral-storage: 229335396Ki
hugepages-2Mi: 0
memory: 16101704Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 211355500604
hugepages-2Mi: 0
memory: 15999304Ki
pods: 110
System Info:
Machine ID: 1082300ebda9485cae458a9761313649
System UUID: E4DAAC81-5262-11CB-96ED-94898013122F
Boot ID: ffd5ce4b-437f-4497-9337-e72c06f88429
Kernel Version: 4.15.0-45-generic
OS Image: Ubuntu 18.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.13.3
Kube-Proxy Version: v1.13.3
PodCIDR: 192.168.1.0/24
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 55m kube-proxy, worker-node Starting kube-proxy.
Normal Starting 55m kube-proxy, worker-node Starting kube-proxy.
Normal Starting 33m kube-proxy, worker-node Starting kube-proxy.
Normal Starting 11m kube-proxy, worker-node Starting kube-proxy.
Warning EvictionThresholdMet 65s (x1139 over 3h31m) kubelet, worker-node Attempting to reclaim ephemeral-storage

This is very strange, by default kubernetes has the label to exclude the master from pod execution.
kubectl get nodes --show-labels
Now check for the label
node-role.kubernetes.io/master=true:NoSchedule
If your master doesn't has this label, you can retain the master with:
kubectl taint nodes $HOSTNAME node-role.kubernetes.io/master=true:NoSchedule

Related

The server went to the freezer unexpectedly [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 days ago.
Improve this question
Server crashed, when starting the analysis in the logs I found these memory messages.
Would this be relevant to the server's freezer?
[JovaTricolo#xxx log]$ grep -R "Feb 11" messages | egrep -v "audit" | egrep -r "Warning|warning|error|erro|panic|boot|memory"
Feb 11 11:19:56 xxx rsyslogd-2036: error starting up disk queue, using pure in-memory mode [try http://www.rsyslog.com/e/2036 ]
Feb 11 11:19:56 xxx kernel: init_memory_mapping: 0000000000000000-00000000bd2f0000
Feb 11 11:19:56 xxx kernel: init_memory_mapping: 0000000100000000-0000002040000000
Feb 11 11:19:56 xxx kernel: bootmap [0000000000100000 - 0000000000307fff] pages 208
Feb 11 11:19:56 xxx kernel: (8 early reservations) ==> bootmem [0000000000 - 1040000000]
Feb 11 11:19:56 xxx kernel: bootmap [0000001040035000 - 0000001040234fff] pages 200
Feb 11 11:19:56 xxx kernel: (8 early reservations) ==> bootmem [1040000000 - 2040000000]
Feb 11 11:19:56 xxx kernel: Reserving 137MB of memory at 48MB for crashkernel (System RAM: 132096MB)
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 000000000009c000 - 0000000000100000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000bd2f0000 - 00000000bd31c000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000bd31c000 - 00000000bd35b000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000bd35b000 - 00000000c0000000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000f0000000 - 00000000fe000000
Feb 11 11:19:56 xxx kernel: PM: Registered nosave memory: 00000000fe000000 - 0000000100000000
Feb 11 11:19:56 xxx kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups
Feb 11 11:19:56 xxx kernel: Initializing cgroup subsys memory
Feb 11 11:19:56 xxx kernel: Freeing initrd memory: 15695k freed
Feb 11 11:19:56 xxx kernel: Non-volatile memory driver v1.3
Feb 11 11:19:56 xxx kernel: crash memory driver: version 1.1
Feb 11 11:19:56 xxx kernel: Freeing unused kernel memory: 1252k freed
Feb 11 11:19:56 xxx kernel: Freeing unused kernel memory: 1051k freed
Feb 11 11:19:56 xxx kernel: Freeing unused kernel memory: 1734k freed
Feb 11 11:19:56 xxx kernel: EXT4-fs (dm-13): warning: checktime reached, running e2fsck is recommended
Feb 11 11:19:56 xxx kernel: EXT4-fs (dm-14): warning: checktime reached, running e2fsck is recommended
Feb 11 11:19:56 xxx kernel: EXT4-fs (dm-15): warning: checktime reached, running e2fsck is recommended
Feb 11 11:19:56 xxx kernel: EXT4-fs (dm-16): warning: maximal mount count reached, running e2fsck is recommended
Feb 11 11:50:05 xxx snmpd[24795]: Warning: no access control information configured.#012 It's unlikely this agent can serve any useful purpose in this state.#012 Run "snmpconf -g basic_setup" to help you configure the snmpd.conf file for this agent.
[JovaTricolo#xxx log]$
Is there any solution for this?

The node had condition: [DiskPressure] causing pod eviction in k8s in azure/aks

I am running k8s 1.14 in azure.
I keep getting pod evictions on some of my pods in the cluster.
As an example:
$ kubectl describe pod kube-prometheus-stack-prometheus-node-exporter-j8nkd
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m22s default-scheduler Successfully assigned monitoring/kube-prometheus-stack-prometheus-node-exporter-j8nkd to aks-default-2678****
Warning Evicted 3m22s kubelet, aks-default-2678**** The node had condition: [DiskPressure].
Which I can also confirm by:
$ kubectl describe node aks-default-2678****
...
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 27 Nov 2019 22:06:08 +0100 Wed, 27 Nov 2019 22:06:08 +0100 RouteCreated RouteController created a route
MemoryPressure False Fri, 23 Oct 2020 15:35:52 +0200 Mon, 25 May 2020 18:51:40 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Fri, 23 Oct 2020 15:35:52 +0200 Sat, 05 Sep 2020 14:36:59 +0200 KubeletHasDiskPressure kubelet has disk pressure
Since this is a managed azure k8s cluster I don't have access to the kubelet on the nodes or the master nodes. Is there anything I can do to investigate/debug this problem without SSH access to the nodes?
Also I assume this comes from storage on the nodes and not from PV/PVC which have been mounted into the pods. So how do I get an overview of storage consumption on the worker nodes without SSH access?
So how do I get an overview of storage consumption on the worker nodes without SSH access?
You can create privileged pod like following:
apiVersion: v1
kind: Pod
metadata:
labels:
run: privileged-pod
name: privileged-pod
spec:
hostIPC: true
hostNetwork: true
hostPID: true
containers:
- args:
- sleep
- "9999"
image: centos:7
name: privileged-pod
volumeMounts:
- name: host-root-volume
mountPath: /host
readOnly: true
volumes:
- name: host-root-volume
hostPath:
path: /
and then exec to it:
kubectl exec -it privileged-pod -- chroot /host
and then you have access to the whole node, just like you would have using ssh.
Note: In case your k8s user has attached pod-security-policy you may not be able to do this, if changeing hostIPC, hostNetwork and hostPID is disallowed.
You also need to make sure that the pod gets scheduled on specific node that you want to have acccess to. Use .spec.nodeName: <name> to acheive it.

Etcd failing to setup cluster due to failure to find etcd local member

I'm attempting to setup a cluster on Ubuntu 18.04 host machines. I'm getting the following error when using DNS for server discovery.
error setting up initial cluster: cannot find local etcd member "etcd-1" in SRV records
I've followed the docs and feel like I've done it all correctly, but I'm new to setting up a local dns (using bind9) and can't tell what I'm doing wrong here. I've tried altering my dns around but have not been able to solve the problem. The issue is only happening when trying to discover the local node...
Here's the SRV records for the _etcd-server
user#etcd-1:~$ dig +noall +answer SRV _etcd-server._tcp.etcd.abc-bird.com
_etcd-server._tcp.etcd.abc-bird.com. 9 IN SRV 0 0 2380 etcd-2.etcd.abc-bird.com.
_etcd-server._tcp.etcd.abc-bird.com. 9 IN SRV 0 0 2380 etcd-3.etcd.abc-bird.com.
_etcd-server._tcp.etcd.abc-bird.com. 9 IN SRV 0 0 2380 etcd-1.etcd.abc-bird.com.
My SRV records for _etcd-client-ssl
user#etcd-1:~$ dig +noall +answer SRV _etcd-client-ssl._tcp.etcd.abc-bird.com
_etcd-client-ssl._tcp.etcd.abc-bird.com. 60 IN SRV 0 0 2379 etcd-2.etcd.abc-bird.com.
_etcd-client-ssl._tcp.etcd.abc-bird.com. 60 IN SRV 0 0 2379 etcd-1.etcd.abc-bird.com.
_etcd-client-ssl._tcp.etcd.abc-bird.com. 60 IN SRV 0 0 2379 etcd-3.etcd.abc-bird.com.
My A records
user#etcd-1:~$ dig +noall +answer etcd-1.etcd.abc-bird.com. etcd-2.etcd.abc-bird.com. etcd-3.etcd.abc-bird.com.
etcd-1.etcd.abc-bird.com. 35 IN A 192.168.0.28
etcd-2.etcd.abc-bird.com. 35 IN A 192.168.0.20
etcd-3.etcd.abc-bird.com. 35 IN A 192.168.0.29
here's my etcd.service file contents
[Unit]
Description=ETCD Service
After=network.target
[Service]
User=etcd
ExecStart=/usr/local/bin/etcd --data-dir=/opt/etcd/data --name=${hostname} \
--discovery-srv etcd.${domain} \
--initial-advertise-peer-urls=https://${hostname}.etcd.${domain}:2380 \
--listen-peer-urls=https://0.0.0.0:2380 \
--listen-client-urls=https://0.0.0.0:2379 \
--peer-cert-allowed-cn=etcd.${domain} \
--advertise-client-urls=https://${hostname}.etcd.${domain}:2379 \
--initial-cluster-token=etcd-cluster-1 \
--initial-cluster-state=new \
--client-cert-auth --trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/cert.pem --key-file=/opt/etcd/key.pem \
--peer-client-cert-auth --peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/cert.pem --peer-key-file=/opt/etcd/key.pem
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
The journalctl logs containing the error
Apr 14 15:23:37 etcd-1 systemd[1]: Started ETCD Service.
Apr 14 15:23:37 etcd-1 etcd[6918]: etcd Version: 3.3.12
Apr 14 15:23:37 etcd-1 etcd[6918]: Git SHA: d57e8b8
Apr 14 15:23:37 etcd-1 etcd[6918]: Go Version: go1.10.8
Apr 14 15:23:37 etcd-1 etcd[6918]: Go OS/Arch: linux/amd64
Apr 14 15:23:37 etcd-1 etcd[6918]: setting maximum number of CPUs to 2, total number of available CPUs is 2
Apr 14 15:23:37 etcd-1 etcd[6918]: peerTLS: cert = /opt/etcd/cert.pem, key = /opt/etcd/key.pem, ca = , trusted-ca = /opt/etcd/ca.pem, client-cert-auth = true, crl-file =
Apr 14 15:23:37 etcd-1 etcd[6918]: listening for peers on https://0.0.0.0:2380
Apr 14 15:23:37 etcd-1 etcd[6918]: listening for client requests on 0.0.0.0:2379
Apr 14 15:23:37 etcd-1 etcd[6918]: got bootstrap from DNS for etcd-server at 0=http://etcd-2.etcd.abc-bird.com:2380
Apr 14 15:23:37 etcd-1 etcd[6918]: got bootstrap from DNS for etcd-server at 1=http://etcd-3.etcd.abc-bird.com:2380
Apr 14 15:23:37 etcd-1 etcd[6918]: error setting up initial cluster: cannot find local etcd member "etcd-1" in SRV records
Apr 14 15:23:37 etcd-1 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Apr 14 15:23:37 etcd-1 systemd[1]: etcd.service: Failed with result 'exit-code'.
Apr 14 15:23:42 etcd-1 systemd[1]: etcd.service: Service hold-off time over, scheduling restart.
For what it's worth, here is my dns config on my name server.
https://gist.github.com/spstratis/1e89f867d86c6b37dc15387ccd310fcc

test image from azure container registry

I created a simple Docker image from a "Hello World" java application.
This is my Dockerfile
FROM java:8
COPY . /var/www/java
WORKDIR /var/www/java
RUN javac HelloWorld.java
CMD ["java", "HelloWorld"]
I pushed the image (java-app) to Azure Container Registry.
$ az acr repository list --name AContainerRegistry --output tableResult
----------------
java-app
I want to deploy it
amhg$ kubectl run dockerproject --image=acontainerregistry.azurecr.io/java-app:v1
deployment.apps "dockerproject" created
amhg$ kubectl expose deployments dockerproject --port=80 --type=LoadBalancer
service "dockerproject" exposed
and see the pods, the pod is crashed
amhg$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dockerproject-b6799d879-pt5rx 0/1 CrashLoopBackOff 8 19m
Is there a way to "test"/run the image from the central registry, how come it crashes?
HERE DESCRIBE POD
amhg$ kubectl describe pod dockerproject-64fbf7649-spc7h
Name: dockerproject-64fbf7649-spc7h
Namespace: default
Node: aks-nodepool1-39744669-0/10.240.0.4
Start Time: Thu, 19 Apr 2018 11:53:58 +0200
Labels: pod-template-hash=209693205
run=dockerproject
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"dockerproject-64fbf7649","uid":"946610e4-43b7-11e8-9537-0a58ac1...
Status: Running
IP: 10.244.0.38
Controlled By: ReplicaSet/dockerproject-64fbf7649
Containers:
dockerproject:
Container ID: docker://1f2a7a6870a37e4d6b53fc834b0d4d3b681e9faaacc3772177a918e66357404e
Image: acontainerregistry.azurecr.io/java-app:v1
Image ID: docker-pullable://acontainerregistry.azurecr.io/java-app#sha256:eaf6fe53a59de287ad76a18de2c7f05580b1f25153624161aadcc7b8ef47b0c4
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 19 Apr 2018 12:35:22 +0200
Finished: Thu, 19 Apr 2018 12:35:23 +0200
Ready: False
Restart Count: 13
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vkpjm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-vkpjm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vkpjm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned dockerproject2-64fbf7649-spc7h to aks-nodepool1-39744669-0
Normal SuccessfulMountVolume 43m kubelet, aks-nodepool1-39744669-0 MountVolume.SetUp succeeded for volume "default-token-vkpjm"
Normal Pulled 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Container image "acontainerregistry.azurecr.io/java-app:v1" already present on machine
Normal Created 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Created container
Normal Started 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Started container
Warning FailedSync 8m (x161 over 43m) kubelet, aks-nodepool1-39744669-0 Error syncing pod
Warning BackOff 3m (x184 over 43m) kubelet, aks-nodepool1-39744669-0 Back-off restarting failed container
When you run an application in the Pod, Kubernetes expects that it will work all the time as a daemon until you will stop it somehow.
In your details about the pod I see this:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 19 Apr 2018 12:35:22 +0200
Finished: Thu, 19 Apr 2018 12:35:23 +0200
It means that your application exited with code 0 (which means "all is ok") right after start. So, the image was successfully downloaded (registry is OK) and run, but the application exited.
That's why Kubernetes tries to restart the pod all the time.
The only thing I can suggest - find a reason why the application stops and fix it.

Performance metrics in Kubernetes Dashboard missing in Azure Kubernetes deployment

Update 2: I was able to get the statistics by using grafana and influxDB. However, I find this overkill. I want to see the current status of my cluster, not persee the historical trends. Based on the linked image, it should be possible by using the pre-deployed Heapster and the Kubernetes Dashboard
Update 1:
With the command below, I do see resource information. I guess the remaining part of the question is why it is not showing up (or how I should configure it to show up) in the kubernetes dashboard, as shown in this image: https://docs.giantswarm.io/img/dashboard-ui.png
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-agentpool0-41204139-0 36m 1% 682Mi 9%
k8s-agentpool0-41204139-1 33m 1% 732Mi 10%
k8s-agentpool0-41204139-10 36m 1% 690Mi 10%
[truncated]
I am trying to monitor performance in my Azure Kubernetes deployment. I noticed it has Heapster running by default. I did not launch this one, but do want to leverage it if it is there. My question is: how can I access it, or is there something wrong with it? Here are the details I can think of, let me know if you need more.
$ kubectl cluster-info
Kubernetes master is running at https://[hidden].uksouth.cloudapp.azure.com
Heapster is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
tiller-deploy is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/tiller-deploy:tiller/proxy
I set up a proxy:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
Point my browser to
localhost:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/#!/workload?namespace=default
I see the kubernetes dashboard, but do notice that I do not see the performance graphs that are displayed at https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/. I also do not see the admin section.
I then point my browser to localhost:8001/api/v1/namespaces/kube-system/services/heapster/proxy and get
404 page not found
Inspecting the pods:
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
heapster-2205950583-43w4b 2/2 Running 0 1d
kube-addon-manager-k8s-master-41204139-0 1/1 Running 0 1d
kube-apiserver-k8s-master-41204139-0 1/1 Running 0 1d
kube-controller-manager-k8s-master-41204139-0 1/1 Running 0 1d
kube-dns-v20-2000462293-1j20h 3/3 Running 0 16h
kube-dns-v20-2000462293-hqwfn 3/3 Running 0 16h
kube-proxy-0kwkf 1/1 Running 0 1d
kube-proxy-13bh5 1/1 Running 0 1d
[truncated]
kube-proxy-zfbb1 1/1 Running 0 1d
kube-scheduler-k8s-master-41204139-0 1/1 Running 0 1d
kubernetes-dashboard-732940207-w7pt2 1/1 Running 0 1d
tiller-deploy-3007245560-4tk78 1/1 Running 0 1d
Checking the log:
$kubectl logs heapster-2205950583-43w4b heapster --namespace=kube-system
I0309 06:11:21.241752 19 heapster.go:72] /heapster --source=kubernetes.summary_api:""
I0309 06:11:21.241813 19 heapster.go:73] Heapster version v1.4.2
I0309 06:11:21.242310 19 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I0309 06:11:21.242331 19 configs.go:62] Using kubelet port 10255
I0309 06:11:21.243557 19 heapster.go:196] Starting with Metric Sink
I0309 06:11:21.344547 19 heapster.go:106] Starting heapster on port 8082
E0309 14:14:05.000293 19 summary.go:389] Node k8s-agentpool0-41204139-32 is not ready
E0309 14:14:05.000331 19 summary.go:389] Node k8s-agentpool0-41204139-56 is not ready
[truncated the other agent pool messages saying not ready]
E0309 14:24:05.000645 19 summary.go:389] Node k8s-master-41204139-0 is not ready
$kubectl describe pod heapster-2205950583-43w4b --namespace=kube-system
Name: heapster-2205950583-43w4b
Namespace: kube-system
Node: k8s-agentpool0-41204139-54/10.240.0.11
Start Time: Fri, 09 Mar 2018 07:11:15 +0100
Labels: k8s-app=heapster
pod-template-hash=2205950583
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"heapster-2205950583","uid":"ac75e772-2360-11e8-9e1c-00224807...
scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.244.58.2
Controlled By: ReplicaSet/heapster-2205950583
Containers:
heapster:
Container ID: docker://a9205e7ab9070a1d1bdee4a1b93eb47339972ad979c4d35e7d6b59ac15a91817
Image: k8s-gcrio.azureedge.net/heapster-amd64:v1.4.2
Image ID: docker-pullable://k8s-gcrio.azureedge.net/heapster-amd64#sha256:f58ded16b56884eeb73b1ba256bcc489714570bacdeca43d4ba3b91ef9897b20
Port: <none>
Command:
/heapster
--source=kubernetes.summary_api:""
State: Running
Started: Fri, 09 Mar 2018 07:11:20 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 121m
memory: 464Mi
Requests:
cpu: 121m
memory: 464Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from heapster-token-txk8b (ro)
heapster-nanny:
Container ID: docker://68e021532a482f32abec844d6f9ea00a4a8232b8d1004b7df4199d2c7d3a3b4c
Image: k8s-gcrio.azureedge.net/addon-resizer:1.7
Image ID: docker-pullable://k8s-gcrio.azureedge.net/addon-resizer#sha256:dcec9a5c2e20b8df19f3e9eeb87d9054a9e94e71479b935d5cfdbede9ce15895
Port: <none>
Command:
/pod_nanny
--cpu=80m
--extra-cpu=0.5m
--memory=140Mi
--extra-memory=4Mi
--threshold=5
--deployment=heapster
--container=heapster
--poll-period=300000
--estimator=exponential
State: Running
Started: Fri, 09 Mar 2018 07:11:18 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 90Mi
Requests:
cpu: 50m
memory: 90Mi
Environment:
MY_POD_NAME: heapster-2205950583-43w4b (v1:metadata.name)
MY_POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from heapster-token-txk8b (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
heapster-token-txk8b:
Type: Secret (a volume populated by a Secret)
SecretName: heapster-token-txk8b
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: <none>
Events: <none>
I have seen in the past that if you restart the dashboard pod it starts working. Can you try that real fast and let me know?

Resources