New AKS cluster unreachable via network (including dashboard) - azure

Yesterday I spun up an Azure Kubernetes Service cluster running a few simple apps. Three of them have exposed public IPs that were reachable yesterday.
As of this morning I can't get the dashboard tunnel to work or the LoadBalancer IPs themselves.
I was asked by the Azure twitter account to solicit help here.
I don't know how to troubleshoot this apparent network issue - only az seems to be able to touch my cluster.
dashboard error log
❯❯❯ make dashboard ~/c/azure-k8s (master)
az aks browse --resource-group=akc-rg-cf --name=akc-237
Merged "akc-237" as current context in /var/folders/9r/wx8xx8ls43l8w8b14f6fns8w0000gn/T/tmppst_atlw
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out
service+pod listing
❯❯❯ kubectl get services,pods ~/c/azure-k8s (master)
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
azure-vote-back ClusterIP 10.0.125.49 <none> 6379/TCP 16h
azure-vote-front LoadBalancer 10.0.185.4 40.71.248.106 80:31211/TCP 16h
hubot LoadBalancer 10.0.20.218 40.121.215.233 80:31445/TCP 26m
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 19h
mti411-web LoadBalancer 10.0.162.209 52.168.123.30 80:30874/TCP 26m
NAME READY STATUS RESTARTS AGE
azure-vote-back-7556ff9578-sjjn5 1/1 Running 0 2h
azure-vote-front-5b8878fdcd-9lpzx 1/1 Running 0 16h
hubot-74f659b6b8-wctdz 1/1 Running 0 9s
mti411-web-6cc87d46c-g255d 1/1 Running 0 26m
mti411-web-6cc87d46c-lhjzp 1/1 Running 0 26m
http failures
❯❯❯ curl --connect-timeout 2 -I http://40.121.215.233 ~/c/azure-k8s (master)
curl: (28) Connection timed out after 2005 milliseconds
❯❯❯ curl --connect-timeout 2 -I http://52.168.123.30 ~/c/azure-k8s (master)
curl: (28) Connection timed out after 2001 milliseconds

If you are getting getsockopt: connection timed out while trying to access to your AKS Dashboard, I think deleting tunnelfront pod will help as once you delete the tunnelfront pod, this will trigger creation of new tunnelfront by Master. Its something I have tried and worked for me.

#daniel Did rebooting the agent VM's solve your issue or are you still seeing issues?

Related

MetalLB works only in master Node, cant reach ip assigned from workers

I've sucessfully installed MetalLB on my Bare Metal Kubernetes cluster, but only pods assigned to the master Node seems to work.
MLB is configured on layer2, in the range of 192.168.0.100-192.168.0.200, and pods do get an IP when assigned to worker nodes, but those ips do not respond to any request.
If the assigned ip is curled inside the node, it works, yet if its curled from another node or machine, it doesnt respond.
Example:
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx2-658ffbbcb6-w5w28 1/1 Running 0 4m51s 10.244.1.2 worker2.homelab.com <none> <none>
nginx21-65b87bcbcb-fv856 1/1 Running 0 4h32m 10.244.0.10 master1.homelab.com <none> <none>
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h49m
nginx2 LoadBalancer 10.111.192.206 192.168.0.111 80:32404/TCP 5h21m
nginx21 LoadBalancer 10.108.222.125 192.168.0.113 80:31387/TCP 4h43m
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1.homelab.com Ready control-plane,master 5h50m v1.20.2 192.168.0.20 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
worker2.homelab.com Ready <none> 10m v1.20.2 192.168.0.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
Deployment nginx2 (Worker2, the one that doest work)
kubectl describe svc nginx2
Name: nginx2
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx2
Type: LoadBalancer
IP: 10.111.192.206
LoadBalancer Ingress: 192.168.0.111
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 32404/TCP
Endpoints: 10.244.1.2:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 10m (x6 over 5h23m) metallb-speaker announcing from node "master1.homelab.com"
Normal nodeAssigned 5m18s metallb-speaker announcing from node "worker2.homelab.com"
[root#worker2 ~]# curl 192.168.0.111
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.111
curl: (7) Failed connect to 192.168.0.111:80; No route to host
Deployment nginx21 (Master1, the one that works)
kubectl describe svc nginx21
Name: nginx21
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx21
Type: LoadBalancer
IP: 10.108.222.125
LoadBalancer Ingress: 192.168.0.113
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 31387/TCP
Endpoints: 10.244.0.10:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 12m (x3 over 4h35m) metallb-speaker announcing from node "master1.homelab.com"
[root#worker2 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
--------- PING WORKS FROM OTHER MACHINES ----------
I've just found out this, so it might be a problem with iptables? i dont really know how it works on MetalLB, i can ping the ip (192.168.0.111) from other machines and it responds
i figured out, after Matt response, it was the firewall that was blocking the access, so i just simply added the whole network to the port 80 and it worked.
[root#worker2 ~]# firewall-cmd --new-zone=kubernetes --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-source=192.168.0.1/16 --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-port=80/tcp --permanent
success
[root#worker2 ~]# firewall-cmd --reload

Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service

We have upgraded our Kubernates Service cluster on Azure to latest version 1.12.4. After that we suddenly recognize that pods and nodes cannot communicate between anymore by private ip :
kubectl get pods -o wide -n kube-system -l component=kube-proxy
NAME READY STATUS RESTARTS AGE IP NODE
kube-proxy-bfhbw 1/1 Running 2 16h 10.0.4.4 aks-agentpool-16086733-1
kube-proxy-d7fj9 1/1 Running 2 16h 10.0.4.35 aks-agentpool-16086733-0
kube-proxy-j24th 1/1 Running 2 16h 10.0.4.97 aks-agentpool-16086733-3
kube-proxy-x7ffx 1/1 Running 2 16h 10.0.4.128 aks-agentpool-16086733-4
As you see the node aks-agentpool-16086733-0 has private IP 10.0.4.35 . When we try to check logs on pods which are on this node we got such error:
Get
https://aks-agentpool-16086733-0:10250/containerLogs/emw-sit/nginx-sit-deploy-864b7d7588-bw966/nginx-sit?tailLines=5000&timestamps=true: dial tcp 10.0.4.35:10250: i/o timeout
We got the Tiller ( Helm) on this node as well, and if try to connect to tiller we got such error from Client PC:
shmits-imac:~ andris.shmits01$ helm version Client:
&version.Version{SemVer:"v2.12.3",
GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e",
GitTreeState:"clean"} Error: forwarding ports: error upgrading
connection: error dialing backend: dial tcp 10.0.4.35:10250: i/o
timeout
Does anybody have any idea why the pods and nodes lost connectivity by private IP ?
So , after we scaled down the cluster from 4 nodes to 2 nodes problem disappeared. And after we again scaled up from 2 nodes to 4 everything started working fine
issue could be with apiserver. did you check logs from apiserver pod?
can you run the below command inside cluster. do you 200 OK response?
curl -k -v https://10.96.0.1/version
These issues come when nodes in the Kubernetes cluster created using kubeadm do not get proper Internal IP addresses matching with Nodes/Machines IP.
Issue: If I run helm list command from my cluster then I get below error
helm list
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h10m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 179m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 167m v1.18.5 10.0.2.15 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
Please note: k-worker02 has internal IP as 10.0.2.15 but I was expecting 10.0.0.7 which is my node/machine IP.
Solution:
Step 1: Connect to Host ( here k-worker02) which does have expected IP
Step 2: open below file
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Step 3: Edit and append with --node-ip 10.0.0.7
code snippet
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --node-ip 10.0.0.7
Step 4: Reload the daemon and restart the kubelet service
sudo systemctl daemon-reload && sudo systemctl restart kubelet
Result:
kubectl get nodes -o wide
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h36m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 3h25m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 3h13m v1.18.5 10.0.0.7 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
With the above solution, the k-worker02 node has got expected IP (10.0.07) and "forwarding ports:" error stops coming from "helm list or helm install commnad".
Reference: https://networkinferno.net/trouble-with-the-kubernetes-node-ip

AKS http-application-routing-nginx-ingress-controller Port 80 is already in use

I have two AKS K8s clusters (ver 1.11.1 in West and North Europe) with http-application-routing addon enabled. Suddenly today pod named addon-http-application-routing-nginx-ingress-controller-xxxx crashed and showed the state:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
kubectl logs addon-http-application-routing-nginx-ingress-controller-xxxx
shows:
I1003 20:21:21.129694 7 flags.go:162] Watching for ingress class: addon-http-application-routing
W1003 20:21:21.129745 7 flags.go:165] only Ingress with class "addon-http-application-routing" will be processed by this ingress controller
F1003 20:21:21.129819 7 main.go:59] Port 80 is already in use. Please check the flag --http-port
If I connect to any node on any cluster and check opened ports with netstat -latun it shows no service on 80 port.
Node restart didn't help.
I just killed the affected node and it started working again. Here's a link where a similar solution also worked:
https://github.com/kubernetes/ingress-nginx/issues/3177

not able to access statefulset headless service from kubernetes

I have created a headless statefull service in kubernates. and cassandra db is running fine.
PS C:\> .\kubectl.exe get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cassandra None <none> 9042/TCP 50m
kubernetes 10.0.0.1 <none> 443/TCP 6d
PS C:\> .\kubectl.exe get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 49m
cassandra-1 1/1 Running 0 48m
cassandra-2 1/1 Running 0 48m
I am running all this on minikube. From my laptop i am trying to connect to 192.168.99.100:9402 using a java program. But it is not able to connect.
Looks like your service not defined with NodePort. can you change service type to NodePort and test it.
when we define svc to NodePort we should get two port number for the service.

kube-dns stays in ContainerCreating status

I have 5 machines running Ubuntu 16.04.1 LTS. I want to set them up as a Kubernetes Cluster. Iḿ trying to follow this getting started guide where they're using kubeadm.
It all worked fine until step 3/4 Installing a pod network. I've looked at there addon page to look for a pod network and chose the flannel overlay network. Iǘe copied the yaml file to the machine and executed:
root#up01:/home/up# kubectl apply -f flannel.yml
Which resulted in:
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
So i thought that it went ok, but when I display all the pod stuff:
root#up01:/etc/kubernetes/manifests# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-d5f50 1/1 Running 0 50m
kube-system etcd-up01 1/1 Running 0 48m
kube-system kube-apiserver-up01 1/1 Running 0 50m
kube-system kube-controller-manager-up01 1/1 Running 0 49m
kube-system kube-discovery-1769846148-jvx53 1/1 Running 0 50m
kube-system kube-dns-2924299975-prlgf 0/4 ContainerCreating 0 49m
kube-system kube-flannel-ds-jb1df 2/2 Running 0 32m
kube-system kube-proxy-rtcht 1/1 Running 0 49m
kube-system kube-scheduler-up01 1/1 Running 0 49m
The problem is that the kube-dns keeps in the ContainerCreating state. I don't know what to do.
It is very likely that you missed this critical piece of information from the guide:
If you want to use flannel as the pod network, specify
--pod-network-cidr 10.244.0.0/16 if you’re using the daemonset manifest below.
If you omit this kube-dns will never leave the ContainerCreating STATUS.
Your kubeadm init command should be:
# kubeadm init --pod-network-cidr 10.244.0.0/16
and not
# kubeadm init
Did you try restarting NetworkManager ...? it worked for me.. Plus, it also worked when I also disabled IPv6.

Resources