Every spark pods scheduled on a single minikube node

Every spark pods scheduled on a single minikube node - apache-spark

Following this tutorial but instead setting minikube replicas to 3
minikube start --nodes 3 --memory 8192 --cpus 4 # enough resources for spark
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane,master 68m v1.22.3
minikube-m02 Ready <none> 68m v1.22.3
minikube-m03 Ready <none> 67m v1.22.3
When I apply the following deployments, everything gets scheduled on a single node, even though other nodes have enough resources.
$ kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
spark-master-9d67dd4b7-tps82 1/1 Running 0 48m 10.244.2.2 minikube-m03 <none> <none>
spark-worker-766ccb5887-64bzk 1/1 Running 0 13s 10.244.2.17 minikube-m03 <none> <none>
spark-worker-766ccb5887-6gvfv 1/1 Running 0 13s 10.244.2.18 minikube-m03 <none> <none>
This is my deployment for workers:
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-worker
spec:
replicas: 15
selector:
matchLabels:
component: spark-worker
template:
metadata:
labels:
component: spark-worker
spec:
containers:
- name: spark-worker
image: mjhea0/spark-hadoop:3.2.0
command: ["/spark-worker"]
ports:
- containerPort: 8081
resources:
requests:
cpu: 100m
and master:
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-master
spec:
replicas: 1
selector:
matchLabels:
component: spark-master
template:
metadata:
labels:
component: spark-master
spec:
containers:
- name: spark-master
image: mjhea0/spark-hadoop:3.2.0
command: ["/spark-master"]
ports:
- containerPort: 7077
- containerPort: 8080
resources:
requests:
cpu: 100m
Any reason why everything sits on a single node?

Solved. Had to increase the replicas to 150 for the scheduler to finally consider other nodes.

Related

Running containers issue

I have a little windows .exe deployed in azure kubernetes cluster. When I run kubectl get podsI get the following result,
NAME READY STATUS RESTARTS AGE
sample-deploy-548d6b9c6b-8v2nb 0/1 CrashLoopBackOff 5 6m12s
sample-deploy-548d6b9c6b-fpmz9 0/1 CrashLoopBackOff 5 6m12s
sample-deploy-548d6b9c6b-hgsj7 0/1 CrashLoopBackOff 5 6m12s
When I run kubectl describe pod sample-deploy-548d6b9c6b-8v2nb I get the following details
Name: sample-deploy-548d6b9c6b-8v2nb
Namespace: default
Priority: 0
Node: akswin000000/10.240.0.35
Start Time: Thu, 02 Jul 2020 16:59:02 +0100
Labels: app=sampleservice
pod-template-hash=548d6b9c6b
Annotations: <none>
Status: Running
IP: 10.240.0.45
IPs: <none>
Controlled By: ReplicaSet/sample-deploy-548d6b9c6b
Containers:
sampleservice:
Container ID: docker://3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca
Image: samplekube.azurecr.io/sample:v1
Image ID: docker-pullable://samplekube.azurecr.io/sample#sha256:a814e92d5af97b8cfbd6cd0789e164858848f82f0316a771670382ce0bbcba92
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: hcsshim::CreateComputeSystem 3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca: The container operating system does not match the host operating system.
(extra info: {"SystemType":"Container","Name":"3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca","Owner":"docker","VolumePath":"\\\\?\\Volume{58649455-b9a5-4d00-b151-485ec8ab6006}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\
\windowsfilter\\3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca","Layers":[{"ID":"7d7579eb-d8f7-5314-b6a0-399937aee9ca","Path":"C:\\ProgramData\\docker\\windowsfilter\\e0357f9d6b48e4b580a09cefedec8aac329894b57a49a30f9dc27795a1626aca"},{"ID":"f9bd195c-3ff
c-5c98-9713-1a7658666667","Path":"C:\\ProgramData\\docker\\windowsfilter\\019404385f250e8807ea3b693e35813b3328b3a14e83da51e8119401f0d20f9f"},{"ID":"0d763990-3499-5a19-b5e9-5e0788397f83","Path":"C:\\ProgramData\\docker\\windowsfilter\\3be0598c3fa3671a1436c670b6964c0a30ddc
2bd2e4011f347e6ef503888826a"},{"ID":"88fb7b4f-d24a-5ddf-9b67-861041ffef72","Path":"C:\\ProgramData\\docker\\windowsfilter\\978600b419ddd768b0b03c09e198d7b8d411cc6ca63b5ba15b6cc5343bb8b2a7"}],"ProcessorWeight":5000,"HostName":"sample-deploy-548d6b9c6b-8v2nb","MappedDirect
ories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8257607b-9506-42af-9068-a3965bb46648\\volumes\\kubernetes.io~secret\\default-token-9wzn2","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"Create
InUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"fbd7d679302c57485ca7d4842528077fbb09e43ad691f47dd4cc84cbd8d3e3db"})
Exit Code: 128
Started: Thu, 02 Jul 2020 16:59:37 +0100
Finished: Thu, 02 Jul 2020 16:59:37 +0100
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9wzn2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9wzn2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9wzn2
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=windows
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/sample-deploy-548d6b9c6b-8v2nb to akswin000000
Warning BackOff 18s (x2 over 45s) kubelet, akswin000000 Back-off restarting failed container
Normal Pulling 7s (x4 over 60s) kubelet, akswin000000 Pulling image "samplekube.azurecr.io/sample:v1"
Normal Pulled 6s (x4 over 57s) kubelet, akswin000000 Successfully pulled image "samplekube.azurecr.io/sample:v1"
Normal Created 5s (x4 over 57s) kubelet, akswin000000 Created container sampleservice
Warning Failed 5s (x4 over 56s) kubelet, akswin000000 Error: failed to start container "sampleservice": Error response from daemon: hcsshim::CreateComputeSystem sampleservice: The container operating system does not match the host operating system.
(extra info: {"SystemType":"Container","Name":"sampleservice","Owner":"docker","VolumePath":"\\\\?\\Volume{58649455-b9a5-4d00-b151-485ec8ab6006}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\sampleservice","Layers":[{"ID":"7d7
579eb-d8f7-5314-b6a0-399937aee9ca","Path":"C:\\ProgramData\\docker\\windowsfilter\\e0357f9d6b48e4b580a09cefedec8aac329894b57a49a30f9dc27795a1626aca"},{"ID":"f9bd195c-3ffc-5c98-9713-1a7658666667","Path":"C:\\ProgramData\\docker\\windowsfilter\\019404385f250e8807ea3b693e35
813b3328b3a14e83da51e8119401f0d20f9f"},{"ID":"0d763990-3499-5a19-b5e9-5e0788397f83","Path":"C:\\ProgramData\\docker\\windowsfilter\\3be0598c3fa3671a1436c670b6964c0a30ddc2bd2e4011f347e6ef503888826a"},{"ID":"88fb7b4f-d24a-5ddf-9b67-861041ffef72","Path":"C:\\ProgramData\\do
cker\\windowsfilter\\978600b419ddd768b0b03c09e198d7b8d411cc6ca63b5ba15b6cc5343bb8b2a7"}],"ProcessorWeight":5000,"HostName":"sample-deploy-548d6b9c6b-8v2nb","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8257607b-9506-42af-9068-a3965bb46648\\volumes\\kuber
netes.io~secret\\default-token-9wzn2","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"fbd7d679302c57485ca7d4842528077
fbb09e43ad691f47dd4cc84cbd8d3e3db"})
It seems to me is try to run the containers under linux instead of windows, I have both a windows and linux nodes in my cluster. How can I resolve this. Thanks
Below the kubectl get nodes -o wide --show-labels
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS
aks-agentpool-38156504-vmss000000 Ready agent 5h17m v1.15.11 10.240.0.4 <none> Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.10+azure agentpool=agentpool,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=westeurope,failure-domain.beta.kubernetes.io/zone=0,kubernetes.azure.com/cluster=MC_testmass_anthonycluster_westeurope,kubernetes.azure.com/mode=system,kubernetes.azure.com/node-image-version=AKSUbuntu-1604-2020.06.18,kubernetes.azure.com/role=agent,kubernetes.io/arch=amd64,kubernetes.io/hostname=aks-agentpool-38156504-vmss000000,kubernetes.io/os=linux,kubernetes.io/role=agent,node-role.kubernetes.io/agent=,storageprofile=managed,storagetier=Premium_LRS
akswin000000 Ready agent 5h14m v1.15.11 10.240.0.35 <none> Windows Server 2019 Datacenter 10.0.17763.1282 docker://19.3.5 agentpool=win,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=windows,failure-domain.beta.kubernetes.io/region=westeurope,failure-domain.beta.kubernetes.io/zone=0,kubernetes.azure.com/cluster=MC_testmass_anthonycluster_westeurope,kubernetes.azure.com/node-image-version=AKSWindows-2019-17763.1282.200610,kubernetes.azure.com/role=agent,kubernetes.io/arch=amd64,kubernetes.io/hostname=akswin000000,kubernetes.io/os=windows,kubernetes.io/role=agent,node-role.kubernetes.io/agent=,storageprofile=managed,storagetier=Premium_LRS
this is the yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-deploy
labels:
app: sampleservice
spec:
replicas: 3
template:
metadata:
name: sampleservice
labels:
app: sampleservice
spec:
nodeSelector:
"kubernetes.io/os": windows
containers:
- name: sampleservice
image: samplekube.azurecr.io/sample:v1
imagePullPolicy: Always
restartPolicy: Always
selector:
matchLabels:
app: sampleservice
---
apiVersion: v1
kind: Service
metadata:
name: sample-service
spec:
selector:
app: sampleservice
ports:
- port: 80
type: LoadBalancer
This is what I am getting now when describe a pod,
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 94s default-scheduler Successfully assigned default/sample-deploy-6d4b86bf46-djtvf to aksnpwin000000
Normal Pulled 31s (x4 over 82s) kubelet, aksnpwin000000 Container image "masskube.azurecr.io/sample2:v1" already present on machine
Normal Created 31s (x4 over 82s) kubelet, aksnpwin000000 Created container sampleservice
Normal Started 29s (x4 over 79s) kubelet, aksnpwin000000 Started container sampleservice
Warning BackOff 3s (x5 over 59s) kubelet, aksnpwin000000 Back-off restarting failed container
And this is what I get on the after running kubectl logs podname
'Sample.exe' is not recognized as an internal or external command,
operable program or batch file.

I have managed to resolve the issue by amending the docker file as follows,
FROM mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2019
WORKDIR /app
EXPOSE 80
COPY /bin/Release .
ENTRYPOINT ["Sample.exe"]
Thanks everyone for the help

You need to set a node selector for your deployment’s template, like this:
nodeSelector:
kubernetes.io/os: windows

You have got nodeSelector kubernetes.io/os: windows in the deployment but the windows node has got label beta.kubernetes.io/os=windows. The nodeSelector and label need to exactly match.
You need to have nodeSelector as below in the pod spec to schedule the pod on windows node.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-deploy
labels:
app: sampleservice
spec:
replicas: 3
template:
metadata:
name: sampleservice
labels:
app: sampleservice
spec:
nodeSelector:
"beta.kubernetes.io/os": windows
containers:
- name: sampleservice
image: samplekube.azurecr.io/sample:v1
imagePullPolicy: Always
restartPolicy: Always
selector:
matchLabels:
app: sampleservice
https://learn.microsoft.com/en-us/azure/aks/windows-container-cli

Services with azure kubernetes not reachable

I am trying to configure azure kubernetes cluster and created one on the portal.dockerized .net core webapi project and also published the image to azure container register. After applying manifest file , i get the message of service created and also the external IP. however when I do get pods I get status "Pending" all the time
NAME READY STATUS RESTARTS AGE
kubdemo1api-6c67bf759f-6slh2 0/1 Pending 0 6h
here is my yaml manifest file, can someone suggest what is wrong here?
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kubdemo1api
labels:
name: kubdemo1api
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
minReadySeconds: 30
selector:
matchLabels:
app: kubdemo1api
template:
metadata:
labels:
app: kubdemo1api
version: "1.0"
tier: backend
spec:
containers:
- name: kubdemo1api
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
timeoutSeconds: 10
image: my container registry image address
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: azkubdemoapi1
spec:
ports:
-
port: 80
selector:
app: kubdemo1api
type: LoadBalancer
EDIT:
Output kubectl describe pods is this
here is it
Normal Scheduled 2m default-scheduler Successfully assigned default/kubdemo1api-697d5655c-64fnj to aks-agentpool-87689508-0
Normal Pulling 37s (x4 over 2m) kubelet, aks-agentpool-87689508-0 pulling image "myacrurl/azkubdemo:v2"
Warning Failed 37s (x4 over 2m) kubelet, aks-agentpool-87689508-0 Failed to pull image "my acr url": [rpc error: code = Unknown desc = Error response from daemon: Get https://myacrurl/v2/azkubdemo/manifests/v2: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://myacrurl/v2/azkubdemo/manifests/v2: unauthorized: authentication required]
Warning Failed 37s (x4 over 2m) kubelet, aks-agentpool-87689508-0 Error: ErrImagePull
Normal BackOff 23s (x6 over 2m) kubelet, aks-agentpool-87689508-0 Back-off pulling image "myacrlurl/azkubdemo:v2"
Warning Failed 11s (x7 over 2m) kubelet, aks-agentpool-87689508-0 Error: ImagePullBackOff

For the error that you provide, it shows you have to authenticate to pull the image from the Azure Container Registry.
Actually, you just need permission to pull the image and the acrpull role is totally enough. There are two ways to achieve it.
One is that just grant the AKS access to the Azure Container Registry. It's simplest on my side. Just need to create the role assignment for the service principal which the AKS used. See Grant AKS access to ACR for the whole steps.
The other one is that use the Kubernetes secret. It's a little more complex than the first one. You need to create a new service principal differ from the one AKS used and grant access to it, then create the kubernetes secret with the service principal. See Access with Kubernetes secret for the whole steps.

This Yaml is Wrong Can you provide the correct yaml, the intending are wrong. Try Below YAML
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kubdemo1api
labels:
name: kubdemo1api
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
minReadySeconds: 30
selector:
matchLabels:
app: kubdemo1api
template:
metadata:
labels:
app: kubdemo1api
version: "1.0"
tier: backend
spec:
containers:
- name: kubdemo1api
image: nginx
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: azkubdemoapi1
spec:
ports:
- port: 80
selector:
app: kubdemo1api
type: LoadBalancer

Cannot access Web API deployed in Azure ACS Kubernetes Cluster

Please help. I am trying to deploy a web API to Azure ACS Kubernetes cluster, it is a simple web API created in VSTS and the result should be like this: { "value1", "value2" }.
I plan to make the type as Cluster-IP but I want to test and access it first that is why this is LoadBalancer, the pods is running and no restart (I think it's good).
The guide I'm following is: Running Web API using Docker and Kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 3d
sampleapi-service LoadBalancer 10.0.238.155 102.51.223.6 80:31676/TCP 1h
When I tried to browse the IP 102.51.223.6/api/values it says:
"This site can’t be reached"
service.yaml
kind: Service
apiVersion: v1
metadata:
name: sampleapi-service
labels:
name: sampleapi
app: sampleapi
spec:
selector:
name: sampleapi
ports:
- protocol: "TCP"
# Port accessible inside the cluster
port: 80
# Port to forwards inside the pod
targetPort: 80
# Port accessible oustide the cluster
#nodePort: 80
type: LoadBalancer
deployment.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: sampleapi-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: sampleapi
spec:
containers:
- name: sampleapi
image: mycontainerregistry.azurecr.io/sampleapi:latest
ports:
- containerPort: 80
POD
Name: sampleapi-deployment-498305766-zzs2z
Namespace: default
Node: c103facs9001/10.240.0.4
Start Time: Fri, 27 Jul 2018 00:20:06 +0000
Labels: app=sampleapi
pod-template-hash=498305766
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"sampleapi-deployme
-498305766","uid":"d064a8e0-9132-11e8-b58d-0...
Status: Running
IP: 10.244.2.223
Controlled By: ReplicaSet/sampleapi-deployment-498305766
Containers:
sampleapi:
Container ID: docker://19d414c87ebafe1cc99d101ac60f1113533e44c24552c75af4ec197d3d3c9c53
Image: mycontainerregistry.azurecr.io/sampleapi:latest
Image ID: docker-pullable://mycontainerregistry.azurecr.io/sampleapi#sha256:9635a9df168ef76a6a27cd46cb15620d762657e9b57a5ac2514ba0b9a8f47a8d
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 27 Jul 2018 00:20:48 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mj5m1 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-mj5m1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mj5m1
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50m default-scheduler Successfully assigned sampleapi-deployment-498305766-zzs2z to c103facs9001
Normal SuccessfulMountVolume 50m kubelet, c103facs9001 MountVolume.SetUp succeeded for volume "default-token-mj5m1"
Normal Pulling 49m kubelet, c103facs9001 pulling image "mycontainerregistry.azurecr.io/sampleapi:latest"
Normal Pulled 49m kubelet, c103facs9001 Successfully pulled image "mycontainerregistry.azurecr.io/sampleapi:latest"
Normal Created 49m kubelet, c103facs9001 Created container
Normal Started 49m kubelet, c103facs9001 Started container

It seems like to me that your service isn't set to a port on the container. You have your targetPort commented out. So the service is reachable on port 80 but the service doesn't know to target the pod on that port.

You will need to start the service which exposes the internal port to some external Ip:port that can be used in your browser to access the service. try this after deploying your deployment and service yml files:
kubectl service sampleapi-service

Unable to setup service DNS in Kubernetes cluster

Kubernetes version --> 1.5.2
I am setting up DNS for Kubernetes services for the first time and I came across SkyDNS.
So following documentation, my skydns-svc.yaml file is :
apiVersion: v1
kind: Service
spec:
clusterIP: 10.100.0.100
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
sessionAffinity: None
type: ClusterIP
And my skydns-rc.yaml file is :
apiVersion: v1
kind: ReplicationController
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v18
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
version: v18
spec:
containers:
- args:
- --domain=kube.local
- --dns-port=10053
image: gcr.io/google_containers/kubedns-amd64:1.6
imagePullPolicy: IfNotPresent
name: kubedns
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
- args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
imagePullPolicy: IfNotPresent
name: dnsmasq
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
- args:
- -cmd=nslookup kubernetes.default.svc.kube.local 127.0.0.1 >/dev/null &&
nslookup kubernetes.default.svc.kube.local 127.0.0.1:10053 >/dev/null
- -port=8080
- -quiet
image: gcr.io/google_containers/exechealthz-amd64:1.0
imagePullPolicy: IfNotPresent
name: healthz
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
Also on my minions, I updated the /etc/systemd/system/multi-user.target.wants/kubelet.service file and added the following under the ExecStart section :
ExecStart=/usr/bin/kubelet \
$KUBE_LOGTOSTDERR \
$KUBE_LOG_LEVEL \
$KUBELET_API_SERVER \
$KUBELET_ADDRESS \
$KUBELET_PORT \
$KUBELET_HOSTNAME \
$KUBE_ALLOW_PRIV \
$KUBELET_POD_INFRA_CONTAINER \
$KUBELET_ARGS \
--cluster-dns=10.100.0.100 \
--cluster-domain=kubernetes \
Having done all of this and having successfully brought up the rc & svc :
[root#kubernetes-master DNS]# kubectl get po | grep dns
kube-dns-v18-hl8z6 3/3 Running 0 6s
[root#kubernetes-master DNS]# kubectl get svc | grep dns
kube-dns 10.100.0.100 <none> 53/UDP,53/TCP 20m
This is all that I got from a config standpoint. Now in order to test my setup, I downloaded busybox and tested a nslookup
[root#kubernetes-master DNS]# kubectl get svc | grep kubernetes
kubernetes 10.100.0.1 <none> 443/TCP
[root#kubernetes-master DNS]# kubectl exec busybox -- nslookup kubernetes
nslookup: can't resolve 'kubernetes'
Server: 10.100.0.100
Address 1: 10.100.0.100
Is there something that I have missed ?
EDIT ::
Going through the logs, I see something that might explain why this is not working :
kubectl logs $(kubectl get pods -l k8s-app=kube-dns -o name) -c kubedns
.
.
.
E1220 17:44:48.403976 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
E1220 17:44:48.487169 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.100.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
I1220 17:44:48.487716 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.100.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1220 17:44:49.410311 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
I1220 17:44:49.492338 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.100.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1220 17:44:49.493429 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.100.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
.
.
.
Looks like kubedns is unable to authorize against K8S master node. I even tried to do a manual call :
curl -k https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0
Unauthorized

Looks like the kube-dns pod is not able to authenticate with the kubernetes api server. I don't see any secret and serviceaccount in the YAML file for the kube-dns pod.
I suggest doing the following:
Create a k8s secret using kubectl create secret for the kube-dns pod with the right certificate file ca.crt and token:
$ kubectl get secrets -n=kube-system | grep dns
kube-dns-token-66tfx kubernetes.io/service-account-token 3 1d
Create a k8s serviceaccount using kubectl create serviceaccount for the kube-dns pod:
$ kubectl get serviceaccounts -n=kube-system | grep dns
kube-dns 1 1d`
Mount the secret at /var/run/secrets/kubernetes.io/serviceaccount inside the kube-dns container in the YAML file:
...
kind: Pod
...
spec:
...
containers:
...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-dns-token-66tfx
readOnly: true
...
volumes:
- name: kube-dns-token-66tfx
secret:
defaultMode: 420
secretName: kube-dns-token-66tfx
Here are the links about creating serviceaccounts for pods:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
https://kubernetes.io/docs/admin/service-accounts-admin/

Kubernetes DNS fails in Kubernetes 1.2

I'm attempting to set up DNS support in Kubernetes 1.2 on Centos 7. According to the documentation, there's two ways to do this. The first applies to a "supported kubernetes cluster setup" and involves setting environment variables:
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.0.0.10"
DNS_DOMAIN="cluster.local"
DNS_REPLICAS=1
I added these settings to /etc/kubernetes/config and rebooted, with no effect, so either I don't have a supported kubernetes cluster setup (what's that?), or there's something else required to set its environment.
The second approach requires more manual setup. It adds two flags to kubelets, which I set by updating /etc/kubernetes/kubelet to include:
KUBELET_ARGS="--cluster-dns=10.0.0.10 --cluster-domain=cluster.local"
and restarting the kubelet with systemctl restart kubelet. Then it's necessary to start a replication controller and a service. The doc page cited above provides a couple of template files for this that require some editing, both for local changes (my Kubernetes API server listens to the actual IP address of the hostname rather than 127.0.0.1, making it necessary to add a --kube-master-url setting) and to remove some Salt dependencies. When I do this, the replication controller starts four containers successfully, but the kube2sky container gets terminated about a minute after completing initialization:
[david#centos dns]$ kubectl --server="http://centos:8080" --namespace="kube-system" logs -f kube-dns-v11-t7nlb -c kube2sky
I0325 20:58:18.516905 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0325 20:58:19.518337 1 kube2sky.go:529] Using http://192.168.87.159:8080 for kubernetes master
I0325 20:58:19.518364 1 kube2sky.go:530] Using kubernetes API v1
I0325 20:58:19.518468 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0325 20:58:19.533597 1 kube2sky.go:660] Successfully added DNS record for Kubernetes service.
F0325 20:59:25.698507 1 kube2sky.go:625] Received signal terminated
I've determined that the termination is done by the healthz container after reporting:
2016/03/25 21:00:35 Client ip 172.17.42.1:58939 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/03/25 21:00:35 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local', at 2016-03-25 21:00:35.608106622 +0000 UTC, error exit status 1
Aside from this, all other logs look normal. However, there is one anomaly: it was necessary to specify --validate=false when creating the replication controller, as the command otherwise gets the message:
error validating "skydns-rc.yaml": error validating data: [found invalid field successThreshold for v1.Probe, found invalid field failureThreshold for v1.Probe]; if you choose to ignore these errors, turn validation off with --validate=false
Could this be related? These arguments come directly Kubernetes documentation. if not, what's needed to get this running?
Here is the skydns-rc.yaml I used:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
# Kube2sky watches all pods.
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain="cluster.local"
- --kube-master-url=http://192.168.87.159:8080
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain="cluster.local"
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
and skydns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: "10.0.0.10"
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP

I just commented out the lines that contain the successThreshold and failureThreshold values in skydns-rc.yaml, then re-run the kubectl commands.
kubectl create -f skydns-rc.yaml
kubectl create -f skydns-svc.yaml

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Every spark pods scheduled on a single minikube node - apache-spark

Solved. Had to increase the replicas to 150 for the scheduler to finally consider other nodes.

Related

Running containers issue

Services with azure kubernetes not reachable

Cannot access Web API deployed in Azure ACS Kubernetes Cluster

Unable to setup service DNS in Kubernetes cluster

Kubernetes DNS fails in Kubernetes 1.2

Categories

Resources