Update 2: I was able to get the statistics by using grafana and influxDB. However, I find this overkill. I want to see the current status of my cluster, not persee the historical trends. Based on the linked image, it should be possible by using the pre-deployed Heapster and the Kubernetes Dashboard
Update 1:
With the command below, I do see resource information. I guess the remaining part of the question is why it is not showing up (or how I should configure it to show up) in the kubernetes dashboard, as shown in this image: https://docs.giantswarm.io/img/dashboard-ui.png
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-agentpool0-41204139-0 36m 1% 682Mi 9%
k8s-agentpool0-41204139-1 33m 1% 732Mi 10%
k8s-agentpool0-41204139-10 36m 1% 690Mi 10%
[truncated]
I am trying to monitor performance in my Azure Kubernetes deployment. I noticed it has Heapster running by default. I did not launch this one, but do want to leverage it if it is there. My question is: how can I access it, or is there something wrong with it? Here are the details I can think of, let me know if you need more.
$ kubectl cluster-info
Kubernetes master is running at https://[hidden].uksouth.cloudapp.azure.com
Heapster is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
tiller-deploy is running at https://[hidden].uksouth.cloudapp.azure.com/api/v1/namespaces/kube-system/services/tiller-deploy:tiller/proxy
I set up a proxy:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
Point my browser to
localhost:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/#!/workload?namespace=default
I see the kubernetes dashboard, but do notice that I do not see the performance graphs that are displayed at https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/. I also do not see the admin section.
I then point my browser to localhost:8001/api/v1/namespaces/kube-system/services/heapster/proxy and get
404 page not found
Inspecting the pods:
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
heapster-2205950583-43w4b 2/2 Running 0 1d
kube-addon-manager-k8s-master-41204139-0 1/1 Running 0 1d
kube-apiserver-k8s-master-41204139-0 1/1 Running 0 1d
kube-controller-manager-k8s-master-41204139-0 1/1 Running 0 1d
kube-dns-v20-2000462293-1j20h 3/3 Running 0 16h
kube-dns-v20-2000462293-hqwfn 3/3 Running 0 16h
kube-proxy-0kwkf 1/1 Running 0 1d
kube-proxy-13bh5 1/1 Running 0 1d
[truncated]
kube-proxy-zfbb1 1/1 Running 0 1d
kube-scheduler-k8s-master-41204139-0 1/1 Running 0 1d
kubernetes-dashboard-732940207-w7pt2 1/1 Running 0 1d
tiller-deploy-3007245560-4tk78 1/1 Running 0 1d
Checking the log:
$kubectl logs heapster-2205950583-43w4b heapster --namespace=kube-system
I0309 06:11:21.241752 19 heapster.go:72] /heapster --source=kubernetes.summary_api:""
I0309 06:11:21.241813 19 heapster.go:73] Heapster version v1.4.2
I0309 06:11:21.242310 19 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I0309 06:11:21.242331 19 configs.go:62] Using kubelet port 10255
I0309 06:11:21.243557 19 heapster.go:196] Starting with Metric Sink
I0309 06:11:21.344547 19 heapster.go:106] Starting heapster on port 8082
E0309 14:14:05.000293 19 summary.go:389] Node k8s-agentpool0-41204139-32 is not ready
E0309 14:14:05.000331 19 summary.go:389] Node k8s-agentpool0-41204139-56 is not ready
[truncated the other agent pool messages saying not ready]
E0309 14:24:05.000645 19 summary.go:389] Node k8s-master-41204139-0 is not ready
$kubectl describe pod heapster-2205950583-43w4b --namespace=kube-system
Name: heapster-2205950583-43w4b
Namespace: kube-system
Node: k8s-agentpool0-41204139-54/10.240.0.11
Start Time: Fri, 09 Mar 2018 07:11:15 +0100
Labels: k8s-app=heapster
pod-template-hash=2205950583
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"heapster-2205950583","uid":"ac75e772-2360-11e8-9e1c-00224807...
scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.244.58.2
Controlled By: ReplicaSet/heapster-2205950583
Containers:
heapster:
Container ID: docker://a9205e7ab9070a1d1bdee4a1b93eb47339972ad979c4d35e7d6b59ac15a91817
Image: k8s-gcrio.azureedge.net/heapster-amd64:v1.4.2
Image ID: docker-pullable://k8s-gcrio.azureedge.net/heapster-amd64#sha256:f58ded16b56884eeb73b1ba256bcc489714570bacdeca43d4ba3b91ef9897b20
Port: <none>
Command:
/heapster
--source=kubernetes.summary_api:""
State: Running
Started: Fri, 09 Mar 2018 07:11:20 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 121m
memory: 464Mi
Requests:
cpu: 121m
memory: 464Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from heapster-token-txk8b (ro)
heapster-nanny:
Container ID: docker://68e021532a482f32abec844d6f9ea00a4a8232b8d1004b7df4199d2c7d3a3b4c
Image: k8s-gcrio.azureedge.net/addon-resizer:1.7
Image ID: docker-pullable://k8s-gcrio.azureedge.net/addon-resizer#sha256:dcec9a5c2e20b8df19f3e9eeb87d9054a9e94e71479b935d5cfdbede9ce15895
Port: <none>
Command:
/pod_nanny
--cpu=80m
--extra-cpu=0.5m
--memory=140Mi
--extra-memory=4Mi
--threshold=5
--deployment=heapster
--container=heapster
--poll-period=300000
--estimator=exponential
State: Running
Started: Fri, 09 Mar 2018 07:11:18 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 90Mi
Requests:
cpu: 50m
memory: 90Mi
Environment:
MY_POD_NAME: heapster-2205950583-43w4b (v1:metadata.name)
MY_POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from heapster-token-txk8b (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
heapster-token-txk8b:
Type: Secret (a volume populated by a Secret)
SecretName: heapster-token-txk8b
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: <none>
Events: <none>
I have seen in the past that if you restart the dashboard pod it starts working. Can you try that real fast and let me know?
Related
When deploying an image to an AKS instance, the image pull from the ACR (Premium SKU) is very slow, even for "small" images around ~150 MBs in size.
Both the AKS resource and the ACR resource are in the Canada East region.
Here is an example:
root#076fff2831b2:/tmp# kubectl describe pod application-service-59bcf96874-pvrmb
Name: application-service-59bcf96874-pvrmb
Namespace: default
Priority: 0
Node: aks-41067869-1/10.255.13.163
Start Time: Tue, 11 Feb 2020 18:15:53 -0500
Labels: app.kubernetes.io/instance=application-service
app.kubernetes.io/name=application-service
pod-template-hash=59bcf96874
Annotations: <none>
Status: Running
IP: 10.255.13.175
IPs: <none>
Controlled By: ReplicaSet/application-service-59bcf96874
Containers:
application-service:
Container ID: docker://0e86526a293d9055d482a09f043f0be68c594244fe4216f8fb190bc2caf6b65b
Image: myacr01.azurecr.io/microservices/application-service:0.0.6
Image ID: docker-pullable://myacr01.azurecr.io/microservices/application-service#sha256:cfbb3ffa7adc52da9cc0b8d7f78376076ea712025b59df8e406c559d369f4085
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 11 Feb 2020 18:35:00 -0500
Finished: Tue, 11 Feb 2020 18:35:00 -0500
Ready: False
Restart Count: 5
Liveness: http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
PORT: 3000
undefined: undefined
Mounts:
/kvmnt from application-service-kv-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from application-service-token-9jk8j (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
application-service-kv-volume:
Type: FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
Driver: azure/kv
FSType:
SecretRef: &LocalObjectReference{Name:kvcreds,}
ReadOnly: false
Options: map[keyvaultname:testIt2 keyvaultobjectnames:APPLICATION-SVC-SQLDB-CS;INGESTION-CONSUMER-EHB-CS;INGESTION-PRODUCER-EHB-CS keyvaultobjecttypes:secret;secret;secret tenantid:REMOVED usepodidentity:false usevmmanagedidentity:false]
application-service-token-9jk8j:
Type: Secret (a volume populated by a Secret)
SecretName: application-service-token-9jk8j
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20m default-scheduler Successfully assigned default/application-service-59bcf96874-pvrmb to aks-41067869-1
Normal Pulling 20m kubelet, aks-41067869-1 Pulling image "myacr01.azurecr.io/microservices/application-service:0.0.6"
Normal Pulled 4m39s kubelet, aks-41067869-1 Successfully pulled image "myacr01.azurecr.io/microservices/application-service:0.0.6"
Normal Started 3m36s (x4 over 4m33s) kubelet, aks-41067869-1 Started container application-service
Warning BackOff 3m4s (x11 over 4m30s) kubelet, aks-41067869-1 Back-off restarting failed container
Normal Pulled 2m52s (x4 over 4m32s) kubelet, aks-41067869-1 Container image "myacr01.azurecr.io/microservices/application-service:0.0.6" already present on machine
Normal Created 2m51s (x5 over 4m33s) kubelet, aks-41067869-1 Created container application-service
Some details were modified/removed for privacy reasons.
However, the thing to note is the ~15m needed to go from a state of "Pulling" to "Pulled" for an image from an ACR.
This issue is occurring daily. The Azure Insights blade of the AKS instance shows a maximum of 26% node CPU and 14.32% node memory utilization over the last 7 days.
How we can go about troubleshooting this further to determine the possible causes of delays?
Any help is greatly appreciated.
Thanks!
I have a small application in nodejs to do tests with kubernetes, but it seems that the application does not keep running
I put all the code that I developed to test, in the GitHub
I'm run kubectl create -f deploy.yaml
Works, but..
[webapp#srvapih ex-node]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-7b89bd4755-4lc6k 1/1 Running 0 5s
api-7b89bd4755-7x964 0/1 ContainerCreating 0 5s
api-7b89bd4755-dv299 1/1 Running 0 5s
api-7b89bd4755-w6tzj 0/1 ContainerCreating 0 5s
api-7b89bd4755-xnm8l 0/1 ContainerCreating 0 5s
[webapp#srvapih ex-node]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-7b89bd4755-4lc6k 0/1 CrashLoopBackOff 1 11s
api-7b89bd4755-7x964 0/1 CrashLoopBackOff 1 11s
api-7b89bd4755-dv299 0/1 CrashLoopBackOff 1 11s
api-7b89bd4755-w6tzj 0/1 CrashLoopBackOff 1 11s
api-7b89bd4755-xnm8l 0/1 CrashLoopBackOff 1 11s
Events for describe pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 6m48s (x5 over 8m14s) kubelet, srvweb05.beirario.intranet Container image "node:8-alpine" already present on machine
Normal Created 6m48s (x5 over 8m14s) kubelet, srvweb05.beirario.intranet Created container
Normal Started 6m48s (x5 over 8m12s) kubelet, srvweb05.beirario.intranet Started container
Normal Scheduled 6m9s default-scheduler Successfully assigned default/api-7b89bd4755-4lc6k to srvweb05.beirario.intranet
Warning BackOff 3m2s (x28 over 8m8s) kubelet, srvweb05.beirario.intranet Back-off restarting failed container
All I can say here - you are providing a task that finish with command: ["/bin/sh","-c", "node", "servidor.js"].
Instead of this you should provide command in that way so it never completes.
Describe your pods shows that container in the pod has been completed successfully with exit code 0
Containers:
ex-node:
Container ID: docker://836ffd771b3514fd13ae3e6b8818a7f35807db55cf8f756e962131823a476675
Image: node:8-alpine
Image ID: docker-pullable://node#sha256:8e9987a6d91d783c56980f1bd4b23b4c05f9f6076d513d6350fef8fe09ed01fd
Port: 3000/TCP
Host Port: 0/TCP
Command:
/bin/sh
-c
node
servidor.js
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 08 Mar 2019 14:29:54 +0000
Finished: Fri, 08 Mar 2019 14:29:54 +0000
you may use "process.stdout.write" method in your code ,This will cause the k8s session to be lost. Do not print anything in stdout!
Try to use pm2 https://pm2.io/docs/runtime/integration/docker/. It starts your nodejs app as a background process.
I'd like to configure cluster autoscaler on AKS. When scaling down it fails due to PDB:
I1207 14:24:09.523313 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-0 cannot be removed: no enough pod disruption budget to move kube-system/metrics-server-5cbc77f79f-44f9w
I1207 14:24:09.523413 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-3 cannot be removed: non-daemonset, non-mirrored, non-pdb-assignedkube-system pod present: cluster-autoscaler-84984799fd-22j42
I1207 14:24:09.523438 1 scale_down.go:490] 2 nodes found to be unremovable in simulation, will re-check them at 2018-12-07 14:29:09.231201368 +0000 UTC m=+8976.856144807
All system pods have minAvailable: 1 PDB assigned manually. I can imagine that this is not working for PODs with only a single replica like the metrics-server:
❯ k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nodepool1-32797235-0 Ready agent 4h v1.11.4 10.240.0.4 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
aks-nodepool1-32797235-3 Ready agent 4h v1.11.4 10.240.0.6 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
❯ ks get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cluster-autoscaler-84984799fd-22j42 1/1 Running 0 2h 10.244.1.5 aks-nodepool1-32797235-3 <none>
heapster-5d6f9b846c-g7qb8 2/2 Running 0 1h 10.244.0.16 aks-nodepool1-32797235-0 <none>
kube-dns-v20-598f8b78ff-8pshc 4/4 Running 0 3h 10.244.1.4 aks-nodepool1-32797235-3 <none>
kube-dns-v20-598f8b78ff-plfv8 4/4 Running 0 1h 10.244.0.15 aks-nodepool1-32797235-0 <none>
kube-proxy-fjvjv 1/1 Running 0 1h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kube-proxy-szr8z 1/1 Running 0 1h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-2rhvg 2/2 Running 0 4h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-r2m4r 2/2 Running 0 4h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kubernetes-dashboard-68f468887f-c8p78 1/1 Running 0 4h 10.244.0.7 aks-nodepool1-32797235-0 <none>
metrics-server-5cbc77f79f-44f9w 1/1 Running 0 4h 10.244.0.3 aks-nodepool1-32797235-0 <none>
tiller-deploy-57f988f854-z9qln 1/1 Running 0 4h 10.244.0.8 aks-nodepool1-32797235-0 <none>
tunnelfront-7cf9d447f9-56g7k 1/1 Running 0 4h 10.244.0.2 aks-nodepool1-32797235-0 <none>
What needs be changed (number of replicas? PDB configuration?) for down-scaling to work?
Basically, this is an administration issues when draining nodes that are configured by PDB ( Pod Disruption Budget )
This is because the evictions are forced to respect the PDB you specify
you have two options:
Either force the hand:
kubectl drain foo --force --grace-period=0
you can check other options from the doc -> https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
or use the eviction api:
{
"apiVersion": "policy/v1beta1",
"kind": "Eviction",
"metadata": {
"name": "quux",
"namespace": "default"
}
}
Anyhow, the drain or the eviction api attempts delete on pod to let them be scheduled elswhere before completely draining the node
As mentioned in the docs:
the API can respond in one of three ways:
If the eviction is granted, then the pod is deleted just as if you had sent a DELETE request to the pod’s URL and you get back 200 OK.
If the current state of affairs wouldn’t allow an eviction by the rules set forth in the budget, you get back 429 Too Many Requests. This is typically used for generic rate limiting of any requests
If there is some kind of misconfiguration, like multiple budgets pointing at the same pod, you will get 500 Internal Server Error.
For a given eviction request, there are two cases:
There is no budget that matches this pod. In this case, the server always returns 200 OK.
There is at least one budget. In this case, any of the three above responses may apply.
If it gets stuck then you might need to do it manually
you can read me here or here
I created a simple Docker image from a "Hello World" java application.
This is my Dockerfile
FROM java:8
COPY . /var/www/java
WORKDIR /var/www/java
RUN javac HelloWorld.java
CMD ["java", "HelloWorld"]
I pushed the image (java-app) to Azure Container Registry.
$ az acr repository list --name AContainerRegistry --output tableResult
----------------
java-app
I want to deploy it
amhg$ kubectl run dockerproject --image=acontainerregistry.azurecr.io/java-app:v1
deployment.apps "dockerproject" created
amhg$ kubectl expose deployments dockerproject --port=80 --type=LoadBalancer
service "dockerproject" exposed
and see the pods, the pod is crashed
amhg$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dockerproject-b6799d879-pt5rx 0/1 CrashLoopBackOff 8 19m
Is there a way to "test"/run the image from the central registry, how come it crashes?
HERE DESCRIBE POD
amhg$ kubectl describe pod dockerproject-64fbf7649-spc7h
Name: dockerproject-64fbf7649-spc7h
Namespace: default
Node: aks-nodepool1-39744669-0/10.240.0.4
Start Time: Thu, 19 Apr 2018 11:53:58 +0200
Labels: pod-template-hash=209693205
run=dockerproject
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"dockerproject-64fbf7649","uid":"946610e4-43b7-11e8-9537-0a58ac1...
Status: Running
IP: 10.244.0.38
Controlled By: ReplicaSet/dockerproject-64fbf7649
Containers:
dockerproject:
Container ID: docker://1f2a7a6870a37e4d6b53fc834b0d4d3b681e9faaacc3772177a918e66357404e
Image: acontainerregistry.azurecr.io/java-app:v1
Image ID: docker-pullable://acontainerregistry.azurecr.io/java-app#sha256:eaf6fe53a59de287ad76a18de2c7f05580b1f25153624161aadcc7b8ef47b0c4
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 19 Apr 2018 12:35:22 +0200
Finished: Thu, 19 Apr 2018 12:35:23 +0200
Ready: False
Restart Count: 13
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vkpjm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-vkpjm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vkpjm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned dockerproject2-64fbf7649-spc7h to aks-nodepool1-39744669-0
Normal SuccessfulMountVolume 43m kubelet, aks-nodepool1-39744669-0 MountVolume.SetUp succeeded for volume "default-token-vkpjm"
Normal Pulled 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Container image "acontainerregistry.azurecr.io/java-app:v1" already present on machine
Normal Created 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Created container
Normal Started 43m (x4 over 43m) kubelet, aks-nodepool1-39744669-0 Started container
Warning FailedSync 8m (x161 over 43m) kubelet, aks-nodepool1-39744669-0 Error syncing pod
Warning BackOff 3m (x184 over 43m) kubelet, aks-nodepool1-39744669-0 Back-off restarting failed container
When you run an application in the Pod, Kubernetes expects that it will work all the time as a daemon until you will stop it somehow.
In your details about the pod I see this:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 19 Apr 2018 12:35:22 +0200
Finished: Thu, 19 Apr 2018 12:35:23 +0200
It means that your application exited with code 0 (which means "all is ok") right after start. So, the image was successfully downloaded (registry is OK) and run, but the application exited.
That's why Kubernetes tries to restart the pod all the time.
The only thing I can suggest - find a reason why the application stops and fix it.
I am trying to get Heapster working on my Kubernetes cluster. I am using Kube-DNS for DNS resolution.
My Kube-DNS seems to be set up correctly:
kubectl describe pod kube-dns-v20-z2dd2 -n kube-system
Name: kube-dns-v20-z2dd2
Namespace: kube-system
Node: 172.31.48.201/172.31.48.201
Start Time: Mon, 22 Jan 2018 09:21:49 +0000
Labels: k8s-app=kube-dns
version=v20
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 172.17.29.4
Controlled By: ReplicationController/kube-dns-v20
Containers:
kubedns:
Container ID: docker://13f95bdf8dee273ca18a2eee1b99fe00e5fff41279776cdef5d7e567472a39dc
Image: gcr.io/google_containers/kubedns-amd64:1.8
Image ID: docker-pullable://gcr.io/google_containers/kubedns-amd64#sha256:39264fd3c998798acdf4fe91c556a6b44f281b6c5797f464f92c3b561c8c808c
Ports: 10053/UDP, 10053/TCP
Args:
--domain=cluster.local.
--dns-port=10053
State: Running
Started: Mon, 22 Jan 2018 09:22:05 +0000
Ready: True
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
dnsmasq:
Container ID: docker://576ebc30e8f7aae13000a2d06541c165a3302376ad04c604b12803463380d9b5
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-amd64#sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Running
Started: Mon, 22 Jan 2018 09:22:20 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
healthz:
Container ID: docker://3367d05fb0e13c892243a4c86c74a170b0a9a2042387a70f6690ed946afda4d2
Image: gcr.io/google_containers/exechealthz-amd64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-amd64#sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Mon, 22 Jan 2018 09:22:32 +0000
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-9zxzd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9zxzd
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned kube-dns-v20-z2dd2 to 172.31.48.201
Normal SuccessfulMountVolume 43m kubelet, 172.31.48.201 MountVolume.SetUp succeeded for volume "default-token-9zxzd"
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Pulled 43m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Created 43m kubelet, 172.31.48.201 Created container
Normal Started 43m kubelet, 172.31.48.201 Started container
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
Normal Pulling 42m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
kubectl describe svc kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.254.0.2
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 172.17.29.4:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 172.17.29.4:53
Session Affinity: None
Events: <none>
kubectl describe ep kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Subsets:
Addresses: 172.17.29.4
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns 53 UDP
dns-tcp 53 TCP
Events: <none>
kubectl exec -it busybox1 -- nslookup kubernetes.default
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local
However if I am trying to resolve http://monitoring-influxdb on either the busybox container (outside the kube-system namespace) it can't get resolved:
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup http://monitoring-influxdb
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'http://monitoring-influxdb': Try again
command terminated with exit code 1
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- cat /etc/resolv.conf
nameserver 10.254.0.2
search kube-system.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'http://monitoring-influxdb'
command terminated with exit code 1
kubectl exec -it busybox1 -- cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
Finally here are the logs from the heapster pod. I could not find any error in the dns pod logs:
kubectl logs heapster-v1.2.0-7657f45c77-65w7w heapster -n kube-system
E0122 09:22:46.966896 1 influxdb.go:217] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: server misbehaving, will retry on use
Any pointers are highly appreciated.
EDIT:
The monitoring-influxdb is located in the same namespace as the heapster (kube-system).
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup monitoring-influxdb.kube-system
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'monitoring-influxdb.kube-system': Name does not resolve
command terminated with exit code 1
But for whatever reason busybox is able to resolve the server.
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb.kube-system
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: monitoring-influxdb.kube-system
Address 1: 10.254.48.109 monitoring-influxdb.kube-system.svc.cluster.local
kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster ClusterIP 10.254.193.208 <none> 80/TCP 1h
kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP 1h
kubernetes-dashboard NodePort 10.254.89.241 <none> 80:32431/TCP 1h
monitoring-grafana ClusterIP 10.254.176.96 <none> 80/TCP 1h
monitoring-influxdb ClusterIP 10.254.48.109 <none> 8083/TCP,8086/TCP 1h
kubectl -n kube-system get ep
NAME ENDPOINTS AGE
heapster 172.17.29.7:8082 1h
kube-controller-manager <none> 1h
kube-dns 172.17.29.6:53,172.17.29.6:53 1h
kubernetes-dashboard 172.17.29.5:9090 1h
monitoring-grafana 172.17.29.3:3000 1h
monitoring-influxdb 172.17.29.3:8086,172.17.29.3:8083 1h
In kubernetes, you can resolve services by their name alone, but only if you are inside the same namespace.
Services are also reachable through a DNS name in the form:
<service name>.<namespace>
From your question is not clear in which namespace you deployed influxdb, but give the above suggestion a try.