Python/Pod cannot reach the internet - python-3.x

I'm using python3 with microk8s to develop a simple web service.
The service is working properly (with docker in my local development machine), but the production machine (Ubuntu18.04 LTS with microk8s in Azure) cannot reach the internet (SMTP/Web REST API) once the pod was started (all internal service is working).
Problem
The pod cannot ping the hostname but the IP address. After investigation, the pod is working as expected except for the external resource. When executing the nslookup, it seems to be ok. But the ping is not working.
bash-5.1# ping www.google.com
ping: bad address 'www.google.com'
bash-5.1# nslookup www.google.com
Server: 10.152.183.10
Address: 10.152.183.10:53
Non-authoritative answer:
Name: www.google.com
Address: 74.125.68.103
Name: www.google.com
Address: 74.125.68.106
Name: www.google.com
Address: 74.125.68.99
Name: www.google.com
Address: 74.125.68.104
Name: www.google.com
Address: 74.125.68.105
Name: www.google.com
Address: 74.125.68.147
Non-authoritative answer:
Name: www.google.com
Address: 2404:6800:4003:c02::93
Name: www.google.com
Address: 2404:6800:4003:c02::63
Name: www.google.com
Address: 2404:6800:4003:c02::67
Name: www.google.com
Address: 2404:6800:4003:c02::69
bash-5.1# ping 74.125.68.103
PING 74.125.68.103 (74.125.68.103): 56 data bytes
64 bytes from 74.125.68.103: seq=0 ttl=55 time=1.448 ms
64 bytes from 74.125.68.103: seq=1 ttl=55 time=1.482 ms
^C
--- 74.125.68.103 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.448/1.465/1.482 ms
bash-5.1# python3
>>> import socket
>>> socket.gethostname()
'projects-dep-65d7b8685f-jzmxx'
>>> socket.gethostbyname('www.google.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno -3] Try again
Environments/Settings
host $ #In Host
host $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
host $ microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
dashboard
dns
ha-cluster
ingress
metrics-server
registry
storage
disabled:
ambassador
cilium
fluentd
gpu
helm
helm3
host-access
istio
jaeger
keda
knative
kubeflow
linkerd
metallb
multus
portainer
prometheus
rbac
traefik
# In Pod
bash-5.1 # python3
>>> import sys
>>> print({'version':sys.version, 'version-info': sys.version_info})
{'version': '3.9.3 (default, Apr 2 2021, 21:20:32) \n[GCC 10.2.1 20201203]', 'version-info': sys.version_info(major=3, minor=9, micro=3, releaselevel='final', serial=0)}
bash-5.1 #
bash-5.1 # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ngqy0alqbw2elndk2awonodqmd.ix.internal.cloudapp.net
nameserver 10.152.183.10
options ndots:5

You can confirm your pod network namespace can connect to external and internal vnet ips or not through the following commands: -
kubectl --namespace=kube-system exec -it ${KUBE-DNS-POD-NAME} -c kubedns -- sh
#run ping/or nslookup using metadata endpoint
If you restart the pod or container, it can fix the issue of hostname not resolving for external IP addresses or else, you can move the pod to a different node. Also, edit the Kubernetes dns add on master (repeat for every master) as below : -
vi /etc/kubernetes/addons/kube-dns-deployment.yaml
And change the arguments for the health container as below: -
"--cmd=nslookup bing.com 127.0.0.1 >/dev/null"
"--url=/healthz-dnsmasq"
"--cmd=nslookup bing.com 127.0.0.1:10053 >/dev/null"
"--url=/healthz-kubedns"
"--port=8080"
"--quiet"
You can also try restarting the kube coredns through the following command: -
kubectl -n kube-system rollout
This will force the kubedns container to restart if the above condition occurs.
Thanking you,

Related

Ubuntu local IP address does not resolve

I set a Hugo web server, which listen on localhost:30000.
The ubuntu machine has the 192.168.2.137 address.
When i do:
curl http://localhost:30000/ -> OK
curl http://127.0.0.1:30000/ -> OK
but,
curl http://192.168.2.137:30000/ -> curl: (7) Failed to connect to 192.168.2.131 port 30000: Connection refused
What could be the reason for that?
My /etc/netplan/00-installer-config.yaml looks like:
network:
version: 2
renderer: NetworkManager
ethernets:
enp0s3:
dhcp4: false
addresses: [192.168.2.137/24]
gateway4: 192.168.2.1
nameservers:
addresses:
- 8.8.8.8
lo:
renderer: networkd
match:
name: lo
addresses:
- 192.168.2.137/24
I also add an entry to /etc/hosts:
192.168.2.137 localhost
You said that it’s listen on localhost so if you use other interface it won’t work which is normal . You should listen on all interfaces or listen to the interface enp0s3.

MetalLB works only in master Node, cant reach ip assigned from workers

I've sucessfully installed MetalLB on my Bare Metal Kubernetes cluster, but only pods assigned to the master Node seems to work.
MLB is configured on layer2, in the range of 192.168.0.100-192.168.0.200, and pods do get an IP when assigned to worker nodes, but those ips do not respond to any request.
If the assigned ip is curled inside the node, it works, yet if its curled from another node or machine, it doesnt respond.
Example:
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx2-658ffbbcb6-w5w28 1/1 Running 0 4m51s 10.244.1.2 worker2.homelab.com <none> <none>
nginx21-65b87bcbcb-fv856 1/1 Running 0 4h32m 10.244.0.10 master1.homelab.com <none> <none>
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h49m
nginx2 LoadBalancer 10.111.192.206 192.168.0.111 80:32404/TCP 5h21m
nginx21 LoadBalancer 10.108.222.125 192.168.0.113 80:31387/TCP 4h43m
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1.homelab.com Ready control-plane,master 5h50m v1.20.2 192.168.0.20 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
worker2.homelab.com Ready <none> 10m v1.20.2 192.168.0.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
Deployment nginx2 (Worker2, the one that doest work)
kubectl describe svc nginx2
Name: nginx2
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx2
Type: LoadBalancer
IP: 10.111.192.206
LoadBalancer Ingress: 192.168.0.111
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 32404/TCP
Endpoints: 10.244.1.2:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 10m (x6 over 5h23m) metallb-speaker announcing from node "master1.homelab.com"
Normal nodeAssigned 5m18s metallb-speaker announcing from node "worker2.homelab.com"
[root#worker2 ~]# curl 192.168.0.111
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.111
curl: (7) Failed connect to 192.168.0.111:80; No route to host
Deployment nginx21 (Master1, the one that works)
kubectl describe svc nginx21
Name: nginx21
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx21
Type: LoadBalancer
IP: 10.108.222.125
LoadBalancer Ingress: 192.168.0.113
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 31387/TCP
Endpoints: 10.244.0.10:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 12m (x3 over 4h35m) metallb-speaker announcing from node "master1.homelab.com"
[root#worker2 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
--------- PING WORKS FROM OTHER MACHINES ----------
I've just found out this, so it might be a problem with iptables? i dont really know how it works on MetalLB, i can ping the ip (192.168.0.111) from other machines and it responds
i figured out, after Matt response, it was the firewall that was blocking the access, so i just simply added the whole network to the port 80 and it worked.
[root#worker2 ~]# firewall-cmd --new-zone=kubernetes --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-source=192.168.0.1/16 --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-port=80/tcp --permanent
success
[root#worker2 ~]# firewall-cmd --reload

Kubernetes ingress "an error on the server ("") has prevented the request from succeeding"

I have a managed azure cluster (AKS) with nginx ingress in it.
It was working fine but now nginx ingress stopped:
# kubectl -v=7 logs nginx-ingress-<pod-hash> -n nginx-ingress
GET https://<PRIVATE-IP-SVC-Kubernetes>:443/version?timeout=32s
I1205 16:59:31.791773 9 round_trippers.go:423] Request Headers:
I1205 16:59:31.791779 9 round_trippers.go:426] Accept: application/json, */*
Unexpected error discovering Kubernetes version (attempt 2): an error on the server ("") has prevented the request from succeeding
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: <PRIVATE-IP-SVC-Kubernetes>
Port: https 443/TCP
TargetPort: 443/TCP
Endpoints: <PUBLIC-IP-SVC-Kubernetes>:443
Session Affinity: None
Events: <none>
When I tried to curl https://PRIVATE-IP-SVC-Kubernetes:443/version?timeout=32s, I've always seen the same output:
curl: (35) SSL connect error
On my OCP 4.7 (OpenShift Container Registry) instances with 3 of master and 2 of worker nodes, the following log appears after kubelet and oc commands.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-5-g76a04fc", GitCommit:"e29b355", GitTreeState:"clean", BuildDate:"2021-06-03T21:19:58Z", GoVersion:"go1.15.7", Compiler:"gc", Platform:"linux/amd64"}
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
$ oc get nodes
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
Also, when I wanted to login to the OCP dashboard, the following error occurred:
error_description": "The authorization server encountered an unexpected condition that prevented it from fulfilling the request
I restarted the whole master node machines then the problem solved.
I faced the same issue with three manager cluster and I was accessing it through ucp client bundle. I figured out 2 out of 3 manager nodes were in not ready state. On debugging further I found space issue on those not ready boxes. After cleaning a little (mainly /var folder) and restart docker, those nodes came to ready state and I'm not getting this error.
On Windows: edit host file (vi /etc/hosts) and replace the line with:
127.0.0.1 ~/.kube/config
Worked for me !!!

Cilium clustermesh with azure

I'm deploying a clustermesh using the Aks-engine. I have installed cilium on two different clusters. Following the clustermesh installation guide everything looks correct. Nodes are listed, the status is correct and no errors appear in the etcd-operator log. However, I cannot access external endpoints. The example app is always answering from the current cluster.
Following the troubleshooting guide I have found in the debuginfo from the agents that no external endpoints are declared. Clusters have a master and two slave nodes. I attach the node list and status from both clusters. I can provide additional logs if required.
Any help would be appreciated.
Cluster1
kubectl -nkube-system exec -it cilium-vg8sm cilium node list
Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR
cluster1/k8s-cilium2-29734124-0 172.18.2.5 192.168.1.0/24
cluster1/k8s-cilium2-29734124-1 172.18.2.4 10.4.0.0/16
cluster1/k8s-master-29734124-0 172.18.1.239 10.239.0.0/16
cluster2/k8s-cilium2-14610979-0 172.18.2.6 192.168.2.0/24
cluster2/k8s-cilium2-14610979-1 172.18.2.7 10.7.0.0/16
cluster2/k8s-master-14610979-0 172.18.2.239 10.239.0.0/16
kubectl -nkube-system exec -it cilium-vg8sm cilium status
KVStore: Ok etcd: 1/1 connected: https://cilium-etcd-client.kube-system.svc:2379 - 3.3.11
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.15 (v1.15.1) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 10/65535 allocated from 10.4.0.0/16
Controller Status: 48/48 healthy
Proxy Status: OK, ip 10.4.0.1, port-range 10000-20000
Cluster health: 6/6 reachable (2019-08-09T10:11:22Z)
Cluster2
kubectl -nkube-system exec -it cilium-rl8gt cilium node list
Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR
cluster1/k8s-cilium2-29734124-0 172.18.2.5 192.168.1.0/24
cluster1/k8s-cilium2-29734124-1 172.18.2.4 10.4.0.0/16
cluster1/k8s-master-29734124-0 172.18.1.239 10.239.0.0/16
cluster2/k8s-cilium2-14610979-0 172.18.2.6 192.168.2.0/24
cluster2/k8s-cilium2-14610979-1 172.18.2.7 10.7.0.0/16
cluster2/k8s-master-14610979-0 172.18.2.239 10.239.0.0/16
kubectl -nkube-system exec -it cilium-rl8gt cilium status
KVStore: Ok etcd: 1/1 connected: https://cilium-etcd-client.kube-system.svc:2379 - 3.3.11
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.15 (v1.15.1) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 10/65535 allocated from 10.7.0.0/16
Controller Status: 48/48 healthy
Proxy Status: OK, ip 10.7.0.1, port-range 10000-20000
Cluster health: 6/6 reachable (2019-08-09T10:40:39Z)
This problem is fixed with https://github.com/cilium/cilium/issues/8849 and will be available in version 1.6.

Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service

We have upgraded our Kubernates Service cluster on Azure to latest version 1.12.4. After that we suddenly recognize that pods and nodes cannot communicate between anymore by private ip :
kubectl get pods -o wide -n kube-system -l component=kube-proxy
NAME READY STATUS RESTARTS AGE IP NODE
kube-proxy-bfhbw 1/1 Running 2 16h 10.0.4.4 aks-agentpool-16086733-1
kube-proxy-d7fj9 1/1 Running 2 16h 10.0.4.35 aks-agentpool-16086733-0
kube-proxy-j24th 1/1 Running 2 16h 10.0.4.97 aks-agentpool-16086733-3
kube-proxy-x7ffx 1/1 Running 2 16h 10.0.4.128 aks-agentpool-16086733-4
As you see the node aks-agentpool-16086733-0 has private IP 10.0.4.35 . When we try to check logs on pods which are on this node we got such error:
Get
https://aks-agentpool-16086733-0:10250/containerLogs/emw-sit/nginx-sit-deploy-864b7d7588-bw966/nginx-sit?tailLines=5000&timestamps=true: dial tcp 10.0.4.35:10250: i/o timeout
We got the Tiller ( Helm) on this node as well, and if try to connect to tiller we got such error from Client PC:
shmits-imac:~ andris.shmits01$ helm version Client:
&version.Version{SemVer:"v2.12.3",
GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e",
GitTreeState:"clean"} Error: forwarding ports: error upgrading
connection: error dialing backend: dial tcp 10.0.4.35:10250: i/o
timeout
Does anybody have any idea why the pods and nodes lost connectivity by private IP ?
So , after we scaled down the cluster from 4 nodes to 2 nodes problem disappeared. And after we again scaled up from 2 nodes to 4 everything started working fine
issue could be with apiserver. did you check logs from apiserver pod?
can you run the below command inside cluster. do you 200 OK response?
curl -k -v https://10.96.0.1/version
These issues come when nodes in the Kubernetes cluster created using kubeadm do not get proper Internal IP addresses matching with Nodes/Machines IP.
Issue: If I run helm list command from my cluster then I get below error
helm list
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h10m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 179m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 167m v1.18.5 10.0.2.15 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
Please note: k-worker02 has internal IP as 10.0.2.15 but I was expecting 10.0.0.7 which is my node/machine IP.
Solution:
Step 1: Connect to Host ( here k-worker02) which does have expected IP
Step 2: open below file
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Step 3: Edit and append with --node-ip 10.0.0.7
code snippet
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --node-ip 10.0.0.7
Step 4: Reload the daemon and restart the kubelet service
sudo systemctl daemon-reload && sudo systemctl restart kubelet
Result:
kubectl get nodes -o wide
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h36m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 3h25m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 3h13m v1.18.5 10.0.0.7 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
With the above solution, the k-worker02 node has got expected IP (10.0.07) and "forwarding ports:" error stops coming from "helm list or helm install commnad".
Reference: https://networkinferno.net/trouble-with-the-kubernetes-node-ip

Resources