I'm trying to run Cassandra on kubernetes.
One thing I understood is Cassandra is trying to access kubernetes api server on port 443 (Secure Connection) but I'm running api server on non secure connection port 8080.
Also there is Has no permission to create /cassandra_data/data directory error
Error (Cassandra pod log):
INFO 13:04:02 Getting endpoints from https://10.100.0.1:443/api/v1/namespaces/default/endpoints/cassandra
WARN 13:04:11 Request to kubernetes apiserver failed
java.io.IOException: Server returned HTTP response code: 401 for URL: https://10.100.0.1:443/api/v1/namespaces/default/endpoints/cassandra
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628) ~[na:1.7.0_95]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) ~[na:1.7.0_95]
at io.k8s.cassandra.KubernetesSeedProvider.getSeeds(KubernetesSeedProvider.java:143) ~[kubernetes-cassandra.jar:na]
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:659) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:136) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:168) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) [apache-cassandra-2.1.13.jar:2.1.13]
INFO 13:04:11 JVM vendor/version: OpenJDK 64-Bit Server VM/1.7.0_95
WARN 13:04:11 OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO 13:04:11 Heap size: 526385152/526385152
INFO 13:04:11 Code Cache Non-heap memory: init = 2555904(2496K) used = 795136(776K) committed = 2555904(2496K) max = 50331648(49152K)
INFO 13:04:11 Eden Space Heap memory: init = 83886080(81920K) used = 79390336(77529K) committed = 83886080(81920K) max = 83886080(81920K)
INFO 13:04:11 Survivor Space Heap memory: init = 10485760(10240K) used = 0(0K) committed = 10485760(10240K) max = 10485760(10240K)
INFO 13:04:11 CMS Old Gen Heap memory: init = 432013312(421888K) used = 0(0K) committed = 432013312(421888K) max = 432013312(421888K)
INFO 13:04:11 CMS Perm Gen Non-heap memory: init = 21757952(21248K) used = 19488984(19032K) committed = 21757952(21248K) max = 174063616(169984K)
INFO 13:04:11 Classpath: /kubernetes-cassandra.jar:/etc/cassandra:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.13.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.13.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.9.2.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar::/usr/share/cassandra/lib/jamm-0.3.0.jar
INFO 13:04:11 JVM Arguments: [-ea, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar, -XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -Xms512M, -Xmx512M, -Xmn100M, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+UseTLAB, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:CMSWaitDuration=10000, -XX:+UseCondCardMark, -Djava.net.preferIPv4Stack=true, -Dcassandra.jmx.local.port=7199, -XX:+DisableExplicitGC, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/var/log/cassandra, -Dcassandra.storagedir=, -Dcassandra-foreground=yes]
WARN 13:04:13 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN 13:04:13 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
ERROR 13:04:20 Directory /cassandra_data/data doesn't exist
ERROR 13:04:20 Has no permission to create /cassandra_data/data directory
ReplicationController:
apiVersion: v1
kind: ReplicationController
metadata:
labels:
app: cassandra
name: cassandra
spec:
replicas: 2
selector:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
containers:
- command:
- /run.sh
resources:
limits:
cpu: 0.1
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: gcr.io/google-samples/cassandra:v8
name: cassandra
ports:
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
volumeMounts:
- mountPath: /cassandra_data
name: data
volumes:
- name: data
emptyDir: {}
You changed cassandra default data directory so first create data directory . Give permission to write and read to created new data directory or run cassandra as root . if you set credential for cassandra then provide user name and password also .
Related
I am want to increase the size of my pvc from 50 GB to 100GB, can you please help on this.
EFK is deployed on Azure kubernetes Cluster and storage type is azurefile-standard-zrs
i have deployed Elasticsearch as a statefullset using helm, tried updating values.yaml file however its not happening .
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
logging-es es 6 2021-02-23 16:01:17.013698 +0000 UTC deployed opendistro-es-1.13.0 1.13.0
[# .kube]$ kubectl get pvc -n es
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-logging-es-opendistro-es-data-0 Bound pvc-e926bbfc-a873-4543-9867-234bc508977c 50Gi RWO azurefile-standard-zrs 53d
data-logging-es-opendistro-es-data-1 Bound pvc-0f5e7e46-5138-45da-90e6-0dfbe0aadff3 50Gi RWO azurefile-standard-zrs 53d
data-logging-es-opendistro-es-master-0 Bound pvc-e8a57019-5eeb-4a93-ba02-3f1b5c2e8fc8 20Gi RWO azurefile-standard-zrs 53d
data-logging-es-opendistro-es-master-1 Bound pvc-2ea1845d-7d08-4fca-b3c4-5b067559af3c 20Gi RWO azurefile-standard-zrs 53d
[# .kube]$ kubectl get pvc data-logging-es-opendistro-es-data-0 -n es -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-file
volume.kubernetes.io/selected-node: azwe-wvm-0
creationTimestamp: "2021-01-15T07:25:20Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: logging-es-opendistro-es
heritage: Helm
release: logging-es
role: data
name: data-logging-es-opendistro-es-data-0
namespace: es
resourceVersion: "33101689"
selfLink: /api/v1/namespaces/es/persistentvolumeclaims/data-logging-es-opendistro-es-data-0
uid: e926bbfc-a873-4543-9867-234bc508977c
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: azurefile-standard-zrs
volumeMode: Filesystem
volumeName: pvc-e926bbfc-a873-234bc508977c
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
phase: Bound
Error on trying helm upgrade
$ helm upgrade -f values_runtime.yaml logging-es infrastructure/opendistro-es -n es --kubeconfig=/home/tsiadm/qa_fmo_config
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/tsiadm/qa_fmo_config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/tsiadm/qa_fmo_config
Error: UPGRADE FAILED: cannot patch "logging-es-opendistro-es-data" with kind StatefulSet: StatefulSet.apps "logging-es-opendistro-es-data" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden```
couple of things:
ensure storageclass used for this pvc is enabled for volume expansion
edit the pvc to request more space
scale down your stateful set to 0 replicas
wait for the pvc to scale up
scale your stateful set back to whatever replicas it requires
I am submitting a simple Pyspark wordcount job to a freshly built YARN cluster, via Airflow and the SparkSubmitOperator. The job hits YARN, I can see it in the ResourceManager UI, but fails with this error:
"Diagnostics: Application application_1582063076991_0002 submitted by user root to unknown queue: root.default"
*User: root
Name: PySpark Wordcount
Application Type: SPARK
Application Tags:
YarnApplicationState: FAILED
Queue: root.default
FinalStatus Reported by AM: FAILED
Started: Fri Feb 21 08:01:25 +1100 2020
Elapsed: 0sec
Tracking URL: History
Diagnostics: Application application_1582063076991_0002 submitted by user root to unknown queue: root.default*
The default.root queue certainly seems to be there:
*Application Queues
Legend:CapacityUsedUsed (over capacity)Max Capacity
.root 0.0% used
..Queue: default 0.0% used
'default' Queue Status
Queue State: RUNNING
Used Capacity: 0.0%
Configured Capacity: 100.0%
Configured Max Capacity: 100.0%
Absolute Used Capacity: 0.0%
Absolute Configured Capacity: 100.0%
Absolute Configured Max Capacity: 100.0%
Used Resources: <memory:0, vCores:0>
Num Schedulable Applications: 0
Num Non-Schedulable Applications: 0
Num Containers: 0
Max Applications: 10000
Max Applications Per User: 10000
Max Application Master Resources: <memory:3072, vCores:1>
Used Application Master Resources: <memory:0, vCores:0>
Max Application Master Resources Per User: <memory:3072, vCores:1>
Configured Minimum User Limit Percent: 100%
Configured User Limit Factor: 1.0
Accessible Node Labels: *
Preemption: disabled*
What am I missing here ? Thanks
Submit with queue name default.
The root in the Resource Manager is used only to group the queues in hierarchical form.
Kubernetes version --> 1.5.2
I am setting up DNS for Kubernetes services for the first time and I came across SkyDNS.
So following documentation, my skydns-svc.yaml file is :
apiVersion: v1
kind: Service
spec:
clusterIP: 10.100.0.100
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
sessionAffinity: None
type: ClusterIP
And my skydns-rc.yaml file is :
apiVersion: v1
kind: ReplicationController
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v18
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
version: v18
spec:
containers:
- args:
- --domain=kube.local
- --dns-port=10053
image: gcr.io/google_containers/kubedns-amd64:1.6
imagePullPolicy: IfNotPresent
name: kubedns
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
- args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
imagePullPolicy: IfNotPresent
name: dnsmasq
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
- args:
- -cmd=nslookup kubernetes.default.svc.kube.local 127.0.0.1 >/dev/null &&
nslookup kubernetes.default.svc.kube.local 127.0.0.1:10053 >/dev/null
- -port=8080
- -quiet
image: gcr.io/google_containers/exechealthz-amd64:1.0
imagePullPolicy: IfNotPresent
name: healthz
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
Also on my minions, I updated the /etc/systemd/system/multi-user.target.wants/kubelet.service file and added the following under the ExecStart section :
ExecStart=/usr/bin/kubelet \
$KUBE_LOGTOSTDERR \
$KUBE_LOG_LEVEL \
$KUBELET_API_SERVER \
$KUBELET_ADDRESS \
$KUBELET_PORT \
$KUBELET_HOSTNAME \
$KUBE_ALLOW_PRIV \
$KUBELET_POD_INFRA_CONTAINER \
$KUBELET_ARGS \
--cluster-dns=10.100.0.100 \
--cluster-domain=kubernetes \
Having done all of this and having successfully brought up the rc & svc :
[root#kubernetes-master DNS]# kubectl get po | grep dns
kube-dns-v18-hl8z6 3/3 Running 0 6s
[root#kubernetes-master DNS]# kubectl get svc | grep dns
kube-dns 10.100.0.100 <none> 53/UDP,53/TCP 20m
This is all that I got from a config standpoint. Now in order to test my setup, I downloaded busybox and tested a nslookup
[root#kubernetes-master DNS]# kubectl get svc | grep kubernetes
kubernetes 10.100.0.1 <none> 443/TCP
[root#kubernetes-master DNS]# kubectl exec busybox -- nslookup kubernetes
nslookup: can't resolve 'kubernetes'
Server: 10.100.0.100
Address 1: 10.100.0.100
Is there something that I have missed ?
EDIT ::
Going through the logs, I see something that might explain why this is not working :
kubectl logs $(kubectl get pods -l k8s-app=kube-dns -o name) -c kubedns
.
.
.
E1220 17:44:48.403976 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
E1220 17:44:48.487169 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.100.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
I1220 17:44:48.487716 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.100.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1220 17:44:49.410311 1 reflector.go:216] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
I1220 17:44:49.492338 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.100.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1220 17:44:49.493429 1 reflector.go:216] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.100.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
.
.
.
Looks like kubedns is unable to authorize against K8S master node. I even tried to do a manual call :
curl -k https://10.100.0.1:443/api/v1/endpoints?resourceVersion=0
Unauthorized
Looks like the kube-dns pod is not able to authenticate with the kubernetes api server. I don't see any secret and serviceaccount in the YAML file for the kube-dns pod.
I suggest doing the following:
Create a k8s secret using kubectl create secret for the kube-dns pod with the right certificate file ca.crt and token:
$ kubectl get secrets -n=kube-system | grep dns
kube-dns-token-66tfx kubernetes.io/service-account-token 3 1d
Create a k8s serviceaccount using kubectl create serviceaccount for the kube-dns pod:
$ kubectl get serviceaccounts -n=kube-system | grep dns
kube-dns 1 1d`
Mount the secret at /var/run/secrets/kubernetes.io/serviceaccount inside the kube-dns container in the YAML file:
...
kind: Pod
...
spec:
...
containers:
...
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-dns-token-66tfx
readOnly: true
...
volumes:
- name: kube-dns-token-66tfx
secret:
defaultMode: 420
secretName: kube-dns-token-66tfx
Here are the links about creating serviceaccounts for pods:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
https://kubernetes.io/docs/admin/service-accounts-admin/
I am creating a cluster of 1 master 2 nodes kubernetes. I am trying to create the skydns based on the following:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
# resources:
# # TODO: Set memory limits when we've profiled the container for large
# # clusters, then set request = limit to keep this container in
# # guaranteed class. Currently, this container falls into the
# # "burstable" category so the kubelet doesn't backoff from restarting it.
# limits:
# cpu: 100m
# memory: 500Mi
# requests:
# cpu: 100m
# memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
# resources:
# # TODO: Set memory limits when we've profiled the container for large
# # clusters, then set request = limit to keep this container in
# # guaranteed class. Currently, this container falls into the
# # "burstable" category so the kubelet doesn't backoff from restarting it.
# limits:
# cpu: 100m
# # Kube2sky watches all pods.
# memory: 200Mi
# requests:
# cpu: 100m
# memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
# successThreshold: 1
# failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
# resources:
# # keep request = limit to keep this container in guaranteed class
# limits:
# cpu: 10m
# memory: 20Mi
# requests:
# cpu: 10m
# memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
However, skydns is spitting out the following:
> $kubectl logs kube-dns-v11-k07j9 --namespace=kube-system skydns
> 2016/04/18 12:47:05 skydns: falling back to default configuration,
> could not read from etcd: 100: Key not found (/skydns) [1] 2016/04/18
> 12:47:05 skydns: ready for queries on cluster.local. for
> tcp://0.0.0.0:53 [rcache 0] 2016/04/18 12:47:05 skydns: ready for
> queries on cluster.local. for udp://0.0.0.0:53 [rcache 0] 2016/04/18
> 12:47:11 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:15 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:19 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:23 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:27 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:31 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:35 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:39 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:43 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:47 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:51 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:47:55 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout" 2016/04/18
> 12:47:59 skydns: failure to forward request "read udp
> 192.168.122.1:53: i/o timeout" 2016/04/18 12:48:03 skydns: failure to forward request "read udp 192.168.122.1:53: i/o timeout"
After looking further, I just realized what is a 192.168.122.1? It is virtual switch on kvm. Why is skydns trying to hit my virtual switch or dns server of virtual machine?
SkyDNS defaults its forwarding nameservers to the one listed in /etc/resolv.conf. Since SkyDNS runs inside the kube-dns pod as a cluster addon, it inherits its /etc/resolv.conf from its host as described in the kube-dns doc.
From your question, it looks like your host's /etc/resolv.conf is configured to use 192.168.122.1 as its nameserver and hence that becomes the forwarding server in your SkyDNS config. I believe 192.168.122.1 is not routable from your Kubernetes cluster and that's why you are seeing "failure to forward request" errors in the kube-dns logs.
The simplest solution to this problem is to supply a reachable DNS server as a flag to SkyDNS in your RC config. Here is an example (it's just your RC config, but adds the -nameservers flag in the SkyDNS container spec):
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
# resources:
# # TODO: Set memory limits when we've profiled the container for large
# # clusters, then set request = limit to keep this container in
# # guaranteed class. Currently, this container falls into the
# # "burstable" category so the kubelet doesn't backoff from restarting it.
# limits:
# cpu: 100m
# memory: 500Mi
# requests:
# cpu: 100m
# memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
# resources:
# # TODO: Set memory limits when we've profiled the container for large
# # clusters, then set request = limit to keep this container in
# # guaranteed class. Currently, this container falls into the
# # "burstable" category so the kubelet doesn't backoff from restarting it.
# limits:
# cpu: 100m
# # Kube2sky watches all pods.
# memory: 200Mi
# requests:
# cpu: 100m
# memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
# successThreshold: 1
# failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local
- -nameservers=8.8.8.8:53,8.8.4.4:53 # Adding this flag. Dont use double quotes.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
# resources:
# # keep request = limit to keep this container in guaranteed class
# limits:
# cpu: 10m
# memory: 20Mi
# requests:
# cpu: 10m
# memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
I'm attempting to set up DNS support in Kubernetes 1.2 on Centos 7. According to the documentation, there's two ways to do this. The first applies to a "supported kubernetes cluster setup" and involves setting environment variables:
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.0.0.10"
DNS_DOMAIN="cluster.local"
DNS_REPLICAS=1
I added these settings to /etc/kubernetes/config and rebooted, with no effect, so either I don't have a supported kubernetes cluster setup (what's that?), or there's something else required to set its environment.
The second approach requires more manual setup. It adds two flags to kubelets, which I set by updating /etc/kubernetes/kubelet to include:
KUBELET_ARGS="--cluster-dns=10.0.0.10 --cluster-domain=cluster.local"
and restarting the kubelet with systemctl restart kubelet. Then it's necessary to start a replication controller and a service. The doc page cited above provides a couple of template files for this that require some editing, both for local changes (my Kubernetes API server listens to the actual IP address of the hostname rather than 127.0.0.1, making it necessary to add a --kube-master-url setting) and to remove some Salt dependencies. When I do this, the replication controller starts four containers successfully, but the kube2sky container gets terminated about a minute after completing initialization:
[david#centos dns]$ kubectl --server="http://centos:8080" --namespace="kube-system" logs -f kube-dns-v11-t7nlb -c kube2sky
I0325 20:58:18.516905 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0325 20:58:19.518337 1 kube2sky.go:529] Using http://192.168.87.159:8080 for kubernetes master
I0325 20:58:19.518364 1 kube2sky.go:530] Using kubernetes API v1
I0325 20:58:19.518468 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0325 20:58:19.533597 1 kube2sky.go:660] Successfully added DNS record for Kubernetes service.
F0325 20:59:25.698507 1 kube2sky.go:625] Received signal terminated
I've determined that the termination is done by the healthz container after reporting:
2016/03/25 21:00:35 Client ip 172.17.42.1:58939 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/03/25 21:00:35 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local', at 2016-03-25 21:00:35.608106622 +0000 UTC, error exit status 1
Aside from this, all other logs look normal. However, there is one anomaly: it was necessary to specify --validate=false when creating the replication controller, as the command otherwise gets the message:
error validating "skydns-rc.yaml": error validating data: [found invalid field successThreshold for v1.Probe, found invalid field failureThreshold for v1.Probe]; if you choose to ignore these errors, turn validation off with --validate=false
Could this be related? These arguments come directly Kubernetes documentation. if not, what's needed to get this running?
Here is the skydns-rc.yaml I used:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
# Kube2sky watches all pods.
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain="cluster.local"
- --kube-master-url=http://192.168.87.159:8080
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain="cluster.local"
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
and skydns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: "10.0.0.10"
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
I just commented out the lines that contain the successThreshold and failureThreshold values in skydns-rc.yaml, then re-run the kubectl commands.
kubectl create -f skydns-rc.yaml
kubectl create -f skydns-svc.yaml