Calico & K8S on Azure - can't access pods - linux

I'm starting with K8S. I installed 2 Debian 10 VMs on Azure (1 master node & 2 slaves).
I installed the master node with this doc:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
I installed Calico with this one :
https://docs.projectcalico.org/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less
I created a simple nginx deployment:
kubectl run nginx --replicas=2 --image=nginx
I have the following pods (sazultk8s1/2 are the working nodes) :
root#itf-infra-sazultk8s0-vm:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-6db489d4b7-mzmnq 1/1 Running 0 12s 192.168.47.18 itf-infra-sazultk8s2-vm
nginx-6db489d4b7-sgdz7 1/1 Running 0 12s 192.168.247.115 itf-infra-sazultk8s1-vm
From the master node I can't curl to these nginx:
root#itf-infra-sazultk8s0-vm:~# curl 192.168.47.18 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds
root#itf-infra-sazultk8s0-vm:~# curl 192.168.247.115 --connect-timeout 5
curl: (28) Connection timed out after 5000 milliseconds
I tried from a simple busybox image:
kubectl run access --rm -ti --image busybox /bin/sh
/ #ifconfig eth0 | grep -i inet
inet addr:192.168.247.116 Bcast:0.0.0.0 Mask:255.255.255.255
/ # wget --timeout 5 192.168.247.115
Connecting to 192.168.247.115 (192.168.247.115:80)
saving to 'index.html'
index.html 100% |********************************************************************************************************| 612 0:00:00 ETA
'index.html' saved
/ # wget --timeout 5 192.168.47.18
Connecting to 192.168.47.18 (192.168.47.18:80)
wget: download timed out
From a scratch install:
does a pod can ping a pod on another host ?
is it possible to curl from master node to a pod on a worker node ?
does azure apply restrictions and prevent k8s to work properly ?

Took me 1 week to solve it.
From the master node, you want to ping/curl Pods located on worker nodes. These Pods are part of a Deployment, itself exposed through a Service.
There are some subtilities in Azure networking which make this not "working out of the box" with default Calico installation.
Steps to make Calico work on Azure
In Kubernetes, Install Calico without a networking backend.
In Azure, Enable IP forwarding on each host.
In Azure, Create UDR (user Defined Routes).
1. Kubernetes, Install Calico without a networking backend
A) Disable Bird
By default, calico.yaml is configured to use bird as a network backend, you have to set it to none.
Official installation step: https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises
Before applying -f calico.yaml, edit the file.
Search for the variable CALICO_NETWORKING_BACKEND
We see that the value is taken from a ConfigMap.
Edit the value in the ConfigMap (located at the top of the file), to set it to none instead of the default bird.
B) Remove Bird from the Readiness & Liveliness probes
Given that we have disabled Bird, it should be removed from the Readiness & Liveliness probes, otherwise, the calico-node deamonset pods won't start. In Calico Manifest, comment out "- -bird-live" and "- bird-ready".
You are done here, you can apply the file: kubectl apply -f
2. Azure, Enable IP forwarding on each host
For each VM in Azure:
Click on it > Networking > click on the Network Interface you have.
Click on IP Configurations
Set IP forwarding to Enabled.
Repeat for each VM, and you are done.
Note: as per the Azure doc, IP forwarding enables the virtual machine a network interface is attached to:
Receive network traffic not destined for one of the IP addresses assigned to any of the IP configurations assigned to the network interface.
Send network traffic with a different source IP address than the one assigned to one of a network interface's IP configurations.
3. Azure, Create UDR (User Defined Routes)
Next, you have to create UDR on your Azure subnet, so that Azure can route the traffic targeted to the (Pod subnet created by Calico on the target Host), to the (IP of the actual target Host itself). So that Azure know that the traffic aimed to that calico subnet, has to be routed to the appropriate node, otherwise Azure doesn't know what to do with this traffic.
Then, when the target node is reached, the target knows how to route the traffic to its underlying Pods.
First, identify the subnet created by Calico on each node.
kubectl get ipamblocks.crd.projectcalico.org \
-o jsonpath="{range .items[*]}{'podNetwork: '}{.spec.cidr}{'\t NodeIP: '}{.spec.affinity}{'\n'}"
On Azure, follows the documentation on how to 'Create a route Table', 'Add Routes of the table', and to 'Associate the route Table to a subnet' (just scroll the doc, sections are one below the other).
The final result should look like this:
You are done! You should now be able to ping/curl your Pods located on other nodes.
References Links
All the reference links expaining the subtilities of Azure Networking, and the different ways to use Calico with Azure (Network+NetworkPolicy, or NetworkPolicy only).
In particular, there are 3 ways to make Calico work on Azure.
The one we just see, where the routes are managed by the User. It seems that this could be called "user managed networking".
Using Azure CNI IPAM plugin. Here we could say "Azure managed networking". Azure will allocate to each Pod an IP inside the Azure subnet, so that Azure knows how to route the traffic.
Calico in VXLAN mode. Here Calico will wrap-up each paquet in another packet, the wrapper will only contain host IPs so that Azure knows how to route them. Then, when reaching the target Node, Calico unwraps the paquet to discover the real target IP, which would be a Pod IP located in the Calico subnet.
In the below documentation, there are explanations on the tradeoff of each setup, in particular the Youtube video.
Youtube (9 min), Kubernetes networking on Azure
Calico-Azure: official site and Git
Cutomizing Calico Maniest
Vocabulary:
CNI = Container network interface
IPAM = IP address management (to allocate IP addresses)

does a pod can ping a pod on another host ?
As per kubernetes networking model yes as long as you have a CNI provider installed.
is it possible to curl from master node to a pod on a worker node ?
You need to create either Nodeport or Loadbalancer type service to access your pods from outside the cluster and for accessing pods from nodes.
does azure apply restrictions and prevent k8s to work properly ?
There may be firewalls restricting traffic between VMs.

Related

Why telnet and nc command report connection works in azure kubernetes pod while it shouldn't

I have an Azure AKS kubernetes cluster. And I created a Pod with Ubuntu container from Ubuntu image and several other Pods from java/.net Dockerfile.
I try to enter to any of the PODs (including the ubuntu one), and execute telnet/nc command in the pod to a remote server/port to validate the remote connection, it's very weird that no matter on which remote server IP and port I choose, they always report connection succeed, but actually the IP/Port should not work.
Here is the command snapshot I executed: From the image You will find I'm telneting to 1.1.1.1 with 1111 port number. I could try any other ip and port number, it always report connection succeed. And I tried to connect to all the other pods in the AKS cluster, they are all the same. I also tried to re-create the AKS kubernetes cluster by choosing CNI network instead of the default Kubenet network, still the same. Could anyone help me on this? Thanks a lot in advance
I figured out the root cause of this issue, it's because I installed Istio as service mesh, and turn out this is the expected behavior by design by referring this link: https://github.com/istio/istio/issues/36540
However, although this is by design of Istio, I'm still very interested in how to easily figure out whether a remote ip/port tcp connection works or not in Istio sidecar enabled POD.

Azure Kubernetes Cluster - Accessing and interacting with Service(s) from on-premise

currently we have the following scenario:
We have established our connection (network wise) from on-premise to the Azure Kubernetes Cluster (private cluster!) without any problems.
Ports which are being routed and are allowed
TCP 80
TCP 443
So far, we are in a development environment and test different configurations.
For setting up our AKS, we need to set the virtual network (via CNI) and Service CIDR. We have set the following configuration (just an example)
Virtual Network: 10.2.0.0 /21
Service CIDR: 10.2.8.0 /23
So, our pods are having IPs from our virtual network subnet, the services are getting their IPs from the Service CIDR. So far so good. A route table for the virtual network (subnet has been associated with the route table) is forwarding all traffic to our firewall and vice versa: Interacting with the virtual network is working without any issue. The network team (which is new to Azure cloud stuff as well) has said that the connection and access to the Service CIDR should be working.
Sadly, we are unable to access the Service CIDR.
For example, let's say we want to establish the kubernetes dashboard (web ui) via https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/. After running the YAML code, the kubernetes dashboard pod and service is being successfully created. The pod can be pinged and "accessed", but the service, which makes it possible to access the kubernetes dashboard via port 443 cannot be accessed. For example https://10.2.8.42/.
My workaround so far is that the kubernetes dashboard (as a Service, type: ClusterIP) is having set an external IP from the virtual network. This sounds all great, but I am not really fond of it, since I have to interact with the virtual network rather than the Service CIDR.
Is this really the correct way? Any hints how to make the Service CIDR accessible? What am I missing ?
Any help would be appreciated.

Unable to communicate between the docker containers from the same subnet after using CNI to assing IPs from azure vnet

I was trying to give IP address to docker container from an azure vnet using the CNI plugin, I was successfully able to give IP address to containers but they were not communicating with each other.
I followed this blog, which is based on this microsoft document.
I created two containers using commands:
sudo ./docker-run.sh alpine1 default alpine
and
sudo ./docker-run.sh alpine2 default alpine
I checked that the IP of alpine1 is 10.10.3.59 and IP of alpine2 is 10.10.3.61 which are the IPs I created inside a network interface as given in the above docs. So they did receive the IPs from a subnet inside vnet but when I ping alpine1 from alpine2 as ping 10.10.3.59, it doesn't work, am I missing something here ? Or I have to do some other configuration after this ?

From what source-IP range do outbound connections from Pods appear to come?

I want to set up connections from a kubernetes cluster (created via az acs create with mostly default settings) to an Azure Postgresql instance, and I'd like to know what source-IP range to enter in postgres HBA (this is the thing Azure calls a firewall-rule under az postgres server).
The thing is, although I can see from the console errors (when using psql to test) what the current IP is that the cluster requests come from
FATAL: no pg_hba.conf entry for host "x.x.x.x" [...]
... I just don't see this IP address anywhere in the cluster properties - and anyway, it would seem a very fragile configuration to just whitelist this one IP address without knowing how it's assigned.
(In the Azure Portal, I do see one "Public IP" associated with the cluster master, but that's not the same as the IP seen by postgres, and, I assume, mainly for ingress.)
So ideally, does ACS let me control the outbound IP addresses for the cluster? And if not, can I figure out programmatically what IP or range of IPs to allow?
It should be the external IP for the node that the pod is scheduled on, e.g. on container engine:
$ kubectl get no -o wide
NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION
gke-cluster-1-node-1 Ready 58d v1.5.4 <example node IP> Container-Optimized OS from Google 4.4.21+
$ ssh gke-cluster-1-node-1
$ curl icanhazip.com
<example node IP>
$ kubectl get po -o wide | grep node-1
example-pod-1 1/1 Running 0 11d <pod IP> gke-cluster-1-node-1
$ kubectl exec -it example-pod-1 curl icanhazip.com
<example node IP>
does ACS let me control the outbound IP addresses for the cluster? And
if not, can I figure out programmatically what IP or range of IPs to
allow?
Based on my knowledge, Azure container service expose docker application to public via Azure load balancer, load balancer will get a public IP address.
By the way, we can't specify which public IP address will associate to Azure load balancer.
After we can expose the application to the internet, we can add the public IP address to your Postgresql's postgres HBA.

Kubernetes: access "public" urls from within a pod

First of all, I am not very expert of K8s, I understand some of the concepts and made already my hands dirty in the configurations.
I correctly set up the cluster configured by my company but I have this issue
I am working on a cluster with 2 pods, ingress rules are correctly configured for www.my-app.com and dashboard.my-app.com.
Both pods runs on the same VM.
If I enter in the dashboard pod (kubectl exec -it $POD bash) and try to curl http://www.my-app.com I land on the dashboard pod again (the same happens all the way around, from www to dashboard).
I have to use http://www-svc.default.svc.cluster.local and http://dashboard-svc.default.svc.cluster.local to land on the correct pods but this is a problem (links generated by the other app will contain internal k8s host, instead of the "public url").
Is there a way to configure routing so I can access pods with their "public" hostnames, from the pods themselves?
So what should happen when you curl is the external DNS record (www.my-app.com in this case) will resolve to your external IP address, usually a load balancer that then sends traffic to a kubernetes service. That service then should send traffic to the appropriate pod. It would seem that you have a misconfigured service. Make sure your service has an external IP that is different between dashboard and www. To see this a simple kubectl get svc should suffice. My guess is that the external IP is wrong, or the service is pointing to the wrong podm which you can see with a kubectl describe svc <name of service>.

Resources