Pods can't resolve external DNS - linux

I have a problem with k8s hosted on my own bare-metal infrastructure.
The k8s was installed via kubeadm init without special configuration, and then I apply CNI plugin
Everything works perfectly expects external DNS resolution from Pod to the external world (internet).
For example:
I have Pod with the name foo, if I invoke command curl google.com I receive error
curl: (6) Could not resolve host: google.com
but if I invoke the same command on the same pod a second time I receive properly HTML
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
and if I repeat this command again I can receive errors with DNS resolution or HTML and so on.
this behavior is random sometimes I must hit 10times and get an error and on 11 hits I can receive Html
I also try to debug this error with this guide, but it does not help.
Additional information:
CoreDNS is up and running and have default config
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
and files /etc/resolv.conf looks fine
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
the problem exists on Centos 8(master, kubeadm init) and on Debian 10(node, kubeadm join)
SELinux in on permissive and SWAP is disabled
it is looks like after install k8s and weavenet problem appear even on the host machine.
I'm not certain where the problem came from either k8s or Linux.
It started after I have installed k8s.
what have I missed?

I can suggest using different CNI plugin and setting it up from scratch. Remember when using kubeadm , apply CNI plugin after you ran kubeadm init, then add worker nodes. Here you can find supported CNI plugins. If the problem still exists, it's probably within your OS.

Check /etc/resolv.conf. The conf file can set the nameserver to 8.8.8.8.

Related

How to access hosts in my network from microk8s deployment pods

I am trying to access a host that sits in another server (but on my network) from inside the pod of deployment and I am using microk8s.
The thing is that on the server where I have microk8s installed I can easily ping it by ping my-network-host.qa.local. But when I go inside the pod with microk8s kubectl exec -it pod_name -- /bin/bash and I do ping my-network-host.qa.local it says: Name or service not known.
And when I connect to a VPN on my computer to be on that network and I deploy it locally using docker-desktop kubernetes I can ping that host from within the pod. So I think the problem sits in microk8s which is not letting my pod use my network.
Is there any way to tell microk8s to use my hosts from my network?
p.s. I can ping the ip of that server from the pod, but I am not being able to ping the host from the pod
Based on another answer that I found on StackOverflow, i managed to fixed it.
There were 2 changes needed to make it work:
Update kubelet configuration to use resolv-conf:
sudo echo "--resolv-conf=/run/systemd/resolve/resolv.conf" >> /var/snap/microk8s/current/args/kubelet
Restart kubelet service:
sudo service snap.microk8s.daemon-kubelet restart
Then change the CoreDNS forward to point to your nameserver:
First open coredns config map so you can edit it
sudo microk8s.kubectl edit configmap coredns -n kube-system
and update the file at
forward . 8.8.8.8 8.8.4.4 #REMOVE THIS LINE
forward . xxx.xxx.xxx.xxx #ADD THIS WITH YOUR IP
You can get eth0 DNS address:
nmcli dev show 2>/dev/null | grep DNS | sed 's/^.*:\s*//'
On my case I already had the ip as nameserver on /run/systemd/resolve/resolv.conf.
Now just save the changes, and go inside your pods so you will be able to access them.
There is a comment in another post that suggest adding
forward . /etc/resolv.conf
But that didnt work on my case.

One openshift-origin worker node won't resolv cluster.local records, causing Imagepullbackoff

We have setup an okd 3.11 cluster with 100+ nodes. Everything was working fine but then a worker node stopped resolving the registry service internal url. This causes new pods to be scheduled to that node fail with ImagePullBackoff error.
Failed to pull image "docker-registry.default.svc:5000/app-name/app-name:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 10.*.*.71:53: server misbehaving
We tried running nslookup on the worker node and following were the results
While this doesn't work (while it works on other nodes)
[root#worker22 ~]# nslookup docker-registry.default.svc.cluster.local
Server: 10.*.*.71
Address: 10.*.*.71#53
** server can't find docker-registry.default.svc.cluster.local: SERVFAIL
This works just fine.
[root#worker22 ~]# nslookup docker-registry.default.svc.cluster.local 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#53
Name: docker-registry.default.svc.cluster.local
Address: 172.*.*.212
Adding server=/cluster.local/172.30.0.1 to dnsmasq conf file /etc/dnsmasq.d/origin-upstream-dns.conf works as a work around but can't find what is causing this.
I have tried adding -q to dnsmasq service's ExecStart and it shows that the dnsmasq won't query the openshift dns running locally at 127.0.0.1:53.
Dnsmasq config/resolv.conf is in order on the node.
I have tried restarting dnsmasq/NetworkManager/Docker, I have tried respawning ovs/sdn pods but still no help.
Found some documented evidence that dnsmasq can behave like that.
It has been suggested by some RedHat articles that a long running dnsmasq service may misbehave and stop resolving names. Similar cases have been reported for openshift environment as well.
The links below suggest that restarting the service would solve the problem for some time and then the issue may resurface. As stated earlier, in my case service restart didn't help but oldest remedy in IT worked (rebooting the node solved the problem).
Reference:
https://access.redhat.com/solutions/3393141
https://bugzilla.redhat.com/show_bug.cgi?id=1560489

Do OpenShift nodes provide a dns service by default?

I have an OpenShift, cluster, and periodically when accessing logs, I get:
worker1-sass-on-prem-origin-3-10 on 10.1.176.130:53: no such host" kube doing a connection to 53 on a node.
I also tend to see tcp: lookup postgres.myapp.svc.cluster.local on 10.1.176.136:53: no such host errors from time to time in pods, again, this makes me think that, when accessing internal service endpoints, pods, clients, and other Kubernetes related services actually talk to a DNS server that is assumed to be running on the given node that said pods are running on.
Update
Looking into one of my pods on a given node, I found the following in resolv.conf (I had to ssh and run docker exec to get this output - since oc exec isn't working due to this issue).
/etc/cfssl $ cat /etc/resolv.conf
nameserver 10.1.176.129
search jim-emea-test.svc.cluster.local svc.cluster.local cluster.local bds-ad.lc opssight.internal
options ndots:5
Thus, it appears that in my cluster, containers have a self-referential resolv.conf entry. This cluster is created with openshift-ansible. I'm not sure if this is infra-specific, or if its actually a fundamental aspect of how openshift nodes work, but i suspect the latter, as I haven't done any major customizations to my ansible workflow from the upstream openshift-ansible recipes.
Yes, DNS on every node is normal in openshift.
It does appear that its normal for an openshift ansible deployment to deploy dnsmasq services on every node.
Details.
As an example of how this can effect things, the following https://github.com/openshift/openshift-ansible/pull/8187 is instructive. In any case, if a local node's dnsmasq is acting flakey for any reason, it will prevent containers running on that node from properly resolving addresses of other containers in a cluster.
Looking deeper at the dnsmasq 'smoking gun'
After checking on an individual node, I found that in fact, there was a process indeed bounded to port 53, and it is dnsmasq. Hence,
[enguser#worker0-sass-on-prem-origin-3-10 ~]$ sudo netstat -tupln | grep 53
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 675/openshift
And, dnsmasq is running locally:
[enguser#worker0-sass-on-prem-origin-3-10 ~]$ ps -ax | grep dnsmasq
4968 pts/0 S+ 0:00 grep --color=auto dnsmasq
6994 ? Ss 0:22 /usr/sbin/dnsmasq -k
[enguser#worker0-sass-on-prem-origin-3-10 ~]$ sudo ps -ax | grep dnsmasq
4976 pts/0 S+ 0:00 grep --color=auto dnsmasq
6994 ? Ss 0:22 /usr/sbin/dnsmasq -k
The final clue, resolv.conf itself is even adding the local IP address as a nameserver... And this is obviously borrowed into containers that start.
nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
Generated by NetworkManager
search cluster.local bds-ad.lc opssight.internal
NOTE: the libc resolver may not support more than 3 nameservers.
The nameservers listed below may not be recognized.
nameserver 10.1.176.129
The solution (in my specific case)
In my case , this was happening because the local nameserver was using an ifcfg (you can see these files in /etc/sysconfig/network-scripts/) with
[enguser#worker0-sass-on-prem-origin-3-10 network-scripts]$ cat ifcfg-ens192
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens192
UUID=50936212-cb5e-41ff-bec8-45b72b014c8c
DEVICE=ens192
ONBOOT=yes
However, my internally configured Virtual Machines could not resolve IPs provided to them by the PEERDNS records.
Ultimately the fix was to work with our IT department to make sure our authoritative domain for our kube clusters had access to all IP addresses in our data center.
The Generic Fix to :53 lookup errors...
If youre seeing the :53 record errors are coming up when you try to kubectl or oc logs / exec, then there is likely that your apiserver is not able to connect with kubelets via their IP address.
If youre seeing :53 record errors in other places, for example, inside of pods, then this is because your pod, using its own local DNS, isnt able to resolve internal cluster IP addresses. This might simply be because you have an outdated controller that is looking for services that don't exist anymore, or else, you have flakiness at your kubernetes dns implementation level.

Unable to enter a kubernetes pod. Error from server: error dialing backend: dial tcp: lookup (node hostname) on 168.63.129.16:53: no such host

We have deployed a K8S cluster using ACS engine in an Azure public cloud.
We are able to create deployments and services but when we enter a pod using "kubectl exec -ti (pod name) (command)" we are receiving the below error,
Error from server: error dialing backend: dial tcp: lookup (node hostname) on 168.63.129.16:53: no such host
I looked all over the internet and performed all I could to fix this issue but no luck so far.
The OS is Ubuntu and 168.63.129.16 is a public IP from Azure used for DNS.(refer below link)
https://blogs.msdn.microsoft.com/mast/2015/05/18/what-is-the-ip-address-168-63-129-16/
I've already added host entries to /etc/hosts and entries into resolv.conf of the master/node server and nslookup resolves the same. I've also tested by adding --resolv-conf flag to the kubelet but still it fails. I'm hoping that someone from this community can help us fix this issue.
Verify the node on which your pod is running can be resolved and reached from inside the API server container. If you added entries to /etc/resolv.conf on the master node verify they are visible in the APIserver container, if they are not, restarting the API server pod might be helpful
The problem was in VirtualBox layer
sudo ifconfig vboxnet0 up
Solution is taken from here https://github.com/kubernetes/minikube/issues/1224#issuecomment-316411907

Adding nameservers to kubernetes

I'm using Kubernetes v1.0.6 on AWS that has been deployed using kube-up.sh.
Cluster is using kube-dns.
$ kubectl get svc kube-dns --namespace=kube-system
NAME LABELS SELECTOR IP(S) PORT(S)
kube-dns k8s-app=kube-dns,kubernetes.io/cluster-service=true,kubernetes.io/name=KubeDNS k8s-app=kube-dns 10.0.0.10 53/UDP
Which works fine.
$ kubectl exec busybox -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10 ip-10-0-0-10.eu-west-1.compute.internal
Name: kubernetes.default
Address 1: 10.0.0.1 ip-10-0-0-1.eu-west-1.compute.internal
This is the resolv.conf of a pod.
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.0.0.10
nameserver 172.20.0.2
search default.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal
Is it possible to have the containers use an additional nameserver?
I have a secondary DNS based service discovery Oon let's say 192.168.0.1) that I would like my kubernetes containers to be able to use for dns resolution.
ps. A kubernetes 1.1 solution would also be acceptable :)
Thank you very much in advance,
George
The DNS addon README has some details on this. Basically, the pod will inherit the resolv.conf setting of the node it is running on, so you could add your extra DNS server to the nodes' /etc/resolv.conf. The kubelet also takes a --resolv-conf argument that may provide a more explicit way for you to inject the extra DNS server. I don't see that flag documented anywhere yet, however.
In Kuberenetes (probably) 1.2 we'll be moving to a model where nameservers are assumed to be fungible. There are too many resolvers that break when different nameservers serve different subsets of DNS, and there is no real specification here that we can point to.
In other words, we'll start dropping the host's nameserver records from the container's merged resolv.conf and making our own DNS server the only nameserver line. Our DNS will be able to forward requests to upstream nameservers.
I eventually managed to solve this pretty easily by configuring SkyDNS to add an additional nameserver, you can just add the environmental variable SKYDNS_NAMESERVERS as defined in the SkyDNS docs in your SkyDNS replication controller. It has minimal impact and does not depend on node changes etc.
env:
- name: SKYDNS_NAMESERVERS
value: 10.0.0.254:53,10.0.64.254:53
For those usign Kubernetes kube-dns, flag -nameservers nor environment variable SKYDNS_NAMESERVERS are no longer avaiable.
Usage of /kube-dns:
--alsologtostderr log to standard error as well as files
--config-map string config-map name. If empty, then the config-map will not used. Cannot be used in conjunction with federations flag. config-map contains dynamically adjustable configuration.
--config-map-namespace string namespace for the config-map (default "kube-system")
--dns-bind-address string address on which to serve DNS requests. (default "0.0.0.0")
--dns-port int port on which to serve DNS requests. (default 53)
--domain string domain under which to create names (default "cluster.local.")
--healthz-port int port on which to serve a kube-dns HTTP readiness probe. (default 8081)
--kube-master-url string URL to reach kubernetes master. Env variables in this flag will be expanded.
--kubecfg-file string Location of kubecfg file for access to kubernetes master service; --kube-master-url overrides the URL part of this; if neither this nor --kube-master-url are provided, defaults to service account tokens
--log-backtrace-at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log-dir string If non-empty, write log files in this directory
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--logtostderr log to standard error instead of files (default true)
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level log level for V logs
--version version[=true] Print version information and quit
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
Now, either you put your name servers on the hosts resolv.conf, so DNS is inherited from the node, or you use custom resolv.conf and add it to Kubelet with the flag --resolv-conf as explained here
You need to know the IP of your Core DNS to set it as a secondary DNS
Run this command to get the CoreDNS IP:
kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP 43d
metrics-server ClusterIP 172.20.232.147 <none> 443/TCP 43d
This is how I setup DNS in my deployment yaml.
I posted the Google DNS IP (for clarity) and my CoreDNS ip, but you should use your VPC DNS and your CoreDNS server.
containers:
- name: nginx
image: nginx
ports:
- containerPort: 8080
dnsPolicy: None
dnsConfig:
nameservers:
- 8.8.8.8
- 172.20.0.10
searches:
- 1b.svc.cluster.local
- svc.cluster.local
- cluster.local
- ec2.internal
options:
- name: ndots
value: "5"

Resources