Cannot check the log from pods after upgrade kubernetes version - azure

Before I could use kubectl logs devops2-pdf-xxx to check the log of the pods.
But after I upgraded the kubectl version, I could not do that. Thus, seems the service is not running well.
But when I run kubectl describe node, the resource allocation is less than 100%.
kubectl logs xxx:
Error from server: Get "https://aks-agentpool-123456-1:10250/containerLogs/default/devops2-deployment-123456-456/devops2-pdf": dial tcp 10.240.0.5:10250: i/o timeout

There are several options to solve this problem. It is probably related to a closed port:
First, check that your port 10250 is open. Similar problem is described here
You are using AKS, so check solution described here:
Make sure that the default network security group isn't modified and that both port 22 and 9000 are open for connection to the API server. Check whether the tunnelfront pod is running in the kube-system namespace using the kubectl get pods --namespace kube-system command. If it isn't, force deletion of the pod and it will restart.
You can also check official Microsoft help page
These timeouts may be related to internal traffic between nodes being blocked. Verify that this traffic is not being blocked, such as by network security groups on the subnet for your cluster's nodes.
or this one.

Related

Why telnet and nc command report connection works in azure kubernetes pod while it shouldn't

I have an Azure AKS kubernetes cluster. And I created a Pod with Ubuntu container from Ubuntu image and several other Pods from java/.net Dockerfile.
I try to enter to any of the PODs (including the ubuntu one), and execute telnet/nc command in the pod to a remote server/port to validate the remote connection, it's very weird that no matter on which remote server IP and port I choose, they always report connection succeed, but actually the IP/Port should not work.
Here is the command snapshot I executed: From the image You will find I'm telneting to 1.1.1.1 with 1111 port number. I could try any other ip and port number, it always report connection succeed. And I tried to connect to all the other pods in the AKS cluster, they are all the same. I also tried to re-create the AKS kubernetes cluster by choosing CNI network instead of the default Kubenet network, still the same. Could anyone help me on this? Thanks a lot in advance
I figured out the root cause of this issue, it's because I installed Istio as service mesh, and turn out this is the expected behavior by design by referring this link: https://github.com/istio/istio/issues/36540
However, although this is by design of Istio, I'm still very interested in how to easily figure out whether a remote ip/port tcp connection works or not in Istio sidecar enabled POD.

Connection Refused in-cluster but Port Forwarding works in Kubernetes

I currently have the Hashicorp Vault helm chart deployed, v0.8.0. It works as intended, I can run port-forward svc/vault 8200:8200 -n vault and log in with vault login -tls-skip-verify, the pod isn't crashing, there are no problems.
I've been verifying configs in the cluster, so I wanted to test access to Vault from another namespace in the same cluster. dig vault.vault.svc.cluster.local resolves to the proper service construct. However, when I run curl -v -k https://vault.vault.svc.cluster.local:8200/v1/sys/health, I get connection refused.
I'm running an AKS Cluster on Kubernetes v1.18 with Azure Policy defaults and the Azure CNI. What would cause this connection refused problem?
The comment #mdaniel left was what led me to solving the problem effectively. I had removed the address and cluster_address fields in the TCP listener config blocks as I forgot Vault only listens on 127.0.0.1. I put those back in and it solved the problem.

Kubectl not working when AKS API authorized ranges are in place

We're implementing security on our k8s cluster in Azure (managed Kubernetes - AKS).
Cluster is deployed via ARM template, the configuration is as following:
1 node, availability set, Standard load balancer, Nginx-based ingress controller, a set of application ddeployed.
According to the document we've updated cluster to protect API server from the whole internet:
az aks update --resource-group xxxxxxxx-xxx-xx-xx-xx-x -n xx-xx-xxx-aksCluster
--api-server-authorized-ip-ranges XX.XX.X.0/24,XX.XX.X.0/24,XX.XXX.XX.0/24,XX.XXX.XXX.XXX/32
--subscription xxxxx-xxx-xxx-xxx-xxxxxx
Operation is completed successfully.
When trying to grab logs from the pod the follwoing error is occured:
kubectl get pods -n lims-dev
NAME READY STATUS RESTARTS AGE
XXXX-76df44bc6d-9wdxr 1/1 Running 0 14h
kubectl logs XXXXX-76df44bc6d-9wdxr -n lims-dev
Error from server: Get https://aks-agentpool-XXXXXX-1:10250/containerLogs/XXXX/XXXXX-
76df44bc6d-9wdxr/listener: dial tcp 10.22.0.35:10250: i/o timeout
When trying to deploy using Azure DevOps, the same error is raised:
2020-04-07T04:49:49.0409528Z ##[error]Error: error installing:
Post https://xxxxx-xxxx-xxxx-akscluster-dns-xxxxxxx.hcp.eastus2.azmk8s.io:443
/apis/extensions/v1beta1/namespaces/kube-system/deployments:
dial tcp XX.XX.XXX.142:443: i/o timeout
Of course, the subnet where I'm running the kubectl is added to authorized range.
I'm trying to understand what's the source of the problem.
You need also to specify --load-balancer-outbound-ips parameter once creating AKS cluster. This IP will be used by your pods to communicate to external world, as well as to AKS API server. See here

Calico & K8S on Azure - can't access pods

I'm starting with K8S. I installed 2 Debian 10 VMs on Azure (1 master node & 2 slaves).
I installed the master node with this doc:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
I installed Calico with this one :
https://docs.projectcalico.org/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less
I created a simple nginx deployment:
kubectl run nginx --replicas=2 --image=nginx
I have the following pods (sazultk8s1/2 are the working nodes) :
root#itf-infra-sazultk8s0-vm:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-6db489d4b7-mzmnq 1/1 Running 0 12s 192.168.47.18 itf-infra-sazultk8s2-vm
nginx-6db489d4b7-sgdz7 1/1 Running 0 12s 192.168.247.115 itf-infra-sazultk8s1-vm
From the master node I can't curl to these nginx:
root#itf-infra-sazultk8s0-vm:~# curl 192.168.47.18 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds
root#itf-infra-sazultk8s0-vm:~# curl 192.168.247.115 --connect-timeout 5
curl: (28) Connection timed out after 5000 milliseconds
I tried from a simple busybox image:
kubectl run access --rm -ti --image busybox /bin/sh
/ #ifconfig eth0 | grep -i inet
inet addr:192.168.247.116 Bcast:0.0.0.0 Mask:255.255.255.255
/ # wget --timeout 5 192.168.247.115
Connecting to 192.168.247.115 (192.168.247.115:80)
saving to 'index.html'
index.html 100% |********************************************************************************************************| 612 0:00:00 ETA
'index.html' saved
/ # wget --timeout 5 192.168.47.18
Connecting to 192.168.47.18 (192.168.47.18:80)
wget: download timed out
From a scratch install:
does a pod can ping a pod on another host ?
is it possible to curl from master node to a pod on a worker node ?
does azure apply restrictions and prevent k8s to work properly ?
Took me 1 week to solve it.
From the master node, you want to ping/curl Pods located on worker nodes. These Pods are part of a Deployment, itself exposed through a Service.
There are some subtilities in Azure networking which make this not "working out of the box" with default Calico installation.
Steps to make Calico work on Azure
In Kubernetes, Install Calico without a networking backend.
In Azure, Enable IP forwarding on each host.
In Azure, Create UDR (user Defined Routes).
1. Kubernetes, Install Calico without a networking backend
A) Disable Bird
By default, calico.yaml is configured to use bird as a network backend, you have to set it to none.
Official installation step: https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises
Before applying -f calico.yaml, edit the file.
Search for the variable CALICO_NETWORKING_BACKEND
We see that the value is taken from a ConfigMap.
Edit the value in the ConfigMap (located at the top of the file), to set it to none instead of the default bird.
B) Remove Bird from the Readiness & Liveliness probes
Given that we have disabled Bird, it should be removed from the Readiness & Liveliness probes, otherwise, the calico-node deamonset pods won't start. In Calico Manifest, comment out "- -bird-live" and "- bird-ready".
You are done here, you can apply the file: kubectl apply -f
2. Azure, Enable IP forwarding on each host
For each VM in Azure:
Click on it > Networking > click on the Network Interface you have.
Click on IP Configurations
Set IP forwarding to Enabled.
Repeat for each VM, and you are done.
Note: as per the Azure doc, IP forwarding enables the virtual machine a network interface is attached to:
Receive network traffic not destined for one of the IP addresses assigned to any of the IP configurations assigned to the network interface.
Send network traffic with a different source IP address than the one assigned to one of a network interface's IP configurations.
3. Azure, Create UDR (User Defined Routes)
Next, you have to create UDR on your Azure subnet, so that Azure can route the traffic targeted to the (Pod subnet created by Calico on the target Host), to the (IP of the actual target Host itself). So that Azure know that the traffic aimed to that calico subnet, has to be routed to the appropriate node, otherwise Azure doesn't know what to do with this traffic.
Then, when the target node is reached, the target knows how to route the traffic to its underlying Pods.
First, identify the subnet created by Calico on each node.
kubectl get ipamblocks.crd.projectcalico.org \
-o jsonpath="{range .items[*]}{'podNetwork: '}{.spec.cidr}{'\t NodeIP: '}{.spec.affinity}{'\n'}"
On Azure, follows the documentation on how to 'Create a route Table', 'Add Routes of the table', and to 'Associate the route Table to a subnet' (just scroll the doc, sections are one below the other).
The final result should look like this:
You are done! You should now be able to ping/curl your Pods located on other nodes.
References Links
All the reference links expaining the subtilities of Azure Networking, and the different ways to use Calico with Azure (Network+NetworkPolicy, or NetworkPolicy only).
In particular, there are 3 ways to make Calico work on Azure.
The one we just see, where the routes are managed by the User. It seems that this could be called "user managed networking".
Using Azure CNI IPAM plugin. Here we could say "Azure managed networking". Azure will allocate to each Pod an IP inside the Azure subnet, so that Azure knows how to route the traffic.
Calico in VXLAN mode. Here Calico will wrap-up each paquet in another packet, the wrapper will only contain host IPs so that Azure knows how to route them. Then, when reaching the target Node, Calico unwraps the paquet to discover the real target IP, which would be a Pod IP located in the Calico subnet.
In the below documentation, there are explanations on the tradeoff of each setup, in particular the Youtube video.
Youtube (9 min), Kubernetes networking on Azure
Calico-Azure: official site and Git
Cutomizing Calico Maniest
Vocabulary:
CNI = Container network interface
IPAM = IP address management (to allocate IP addresses)
does a pod can ping a pod on another host ?
As per kubernetes networking model yes as long as you have a CNI provider installed.
is it possible to curl from master node to a pod on a worker node ?
You need to create either Nodeport or Loadbalancer type service to access your pods from outside the cluster and for accessing pods from nodes.
does azure apply restrictions and prevent k8s to work properly ?
There may be firewalls restricting traffic between VMs.

DNS issues in GCE & k8s

i use google k8s as a service with preemptible instances.
i faced with problem when google preempt one of node which serving kube-dns pod i get 5-7 mins failures in all another pods with "Cannot resolve" error.
I tried run second kube-dns pod but sometime both dns is running on the same node and i get failures again. I tried define nodeSelector for kube-dns pod but got error
Pod "kube-dns-2185667875-8b42l" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
There is a possibility to run dns pods on different nodes redundantly? Maybe there are any best practice?
You can not modify POD like this, you need to modify your Deployment. Also you might want to look into pod anti-affinity to separate your pods in the same deployment in a way that will never schedule them on the same node. Alternatively, you can also switch from Deployment to DaemonSet to get exactly one pod running per node in cluster.

Resources