Istio in Azure AKS - Connection issues over 15001 port while connecting to Azure Redis Cache - azure

We are facing issues on 15001 port in istio deployed in Azure AKS.
Currently we have deployed Istio in AKS and trying to connect to Azure cache redis instance in cluster mode. Our Azure redis instance is having more than two shards with SSL enabled and one of the master node is assigned on port 15001. We were able to connect to Azure redis from AKS pods over ports 6380, 15000, 15002, 15003, 15004 and 15005 ports. However when we try to connect to over 15001 we see some issues. When we try to connect to redis over 15001 port from a namespace without istio sidecar injection from same aks cluster the connection is working fine.
Below are the logs from rediscli pod deployed in our AKS cluster.
Success case:
redis-cli -h our-redis-host.redis.cache.windows.net -p 6380 -a our-account-key --cacert "BaltimoreCyberTrustRoot.pem" --tls ping
OUTPUT:
Warning: Using a password with ‘-a’ or ‘-u’ option on the command line interface may not be safe.
PONG
We are able to connect over all ports - 6380, 15000, 15002, 15003, 15004 and 15005 to redis. However when we try to conenct using 15001. We are getting below error
Failure case:
redis-cli -h our-redis-host.redis.cache.windows.net -p 15001 -a our-account-key --cacert "BaltimoreCyberTrustRoot.pem" --tls ping
OUTPUT:
Warning: Using a password with ‘-a’ or ‘-u’ option on the command line interface may not be safe.
Could not connect to Redis at our-redis-host.redis.cache.windows.net :15001: SSL_connect failed: Success
I could not see any entry in istio-proxy logs when trying from 15001 port. However when trying for other ports we can see entry in logs as below
[2021-05-05T00:59:18.677Z] "- - -" 0 - - - "-" 600 3982 10 - "-" "-" "-" "-" "172.XX.XX.XX:6380" PassthroughCluster 172.XX.XX.XX:45478 172.22.XX.XX:6380 172.XX.XX.XX:45476 - -
Is this because 15001 port blocks the outbound requests or manipulates certs for requests on 15001 port. If yes, is there any configuration to update the proxy_port to other ports than 15001?
Note: Posted this on istio forum . Posting here for better reach.
Istio versions:
> istioctl version
client version: 1.8.2
control plane version: 1.8.3
data plane version: 1.8.3

Port 15001 is used by Envoy in Istio. Applications should not use ports reserved by Istio to avoid conflicts.
You can read more here

We have utilised the concept of istio excludeOutboundPorts annotation to bypass the the istio envoy proxy interception of the traffic on outbound ports for which we are see the problem due to istio port requirements
Using annotations provided by istio, we can use either IP or port ranges to exclude the interception. Below is an explain with ports
template:
metadata:
labels:
app: 'APP-NAME'
annotations:
traffic.sidecar.istio.io/excludeOutboundPorts: "15001"
References:
Istio Annotations
Istio traffic capture limitations
Istio Port Requirement

Related

DNS doesn't remove not ready pod in AKS with Azure CNI enabled

How does AKS make not ready pod unavailable to accept requests into it? It only works if you have a service in front of that deployment correct?
I'd like to start this off by trying to explain what I had noticed in aks that is not configured with azure cni and then go on to explain what I have been seeing in aks with azure cni enabled.
In AKS without cni enabled if I execute a curl on url on a not ready pod behind a service like this curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 what I get in the response is unresolvable hostname or something like that. Which means in my understanding that DNS doesn't have this entry. This is how in normal way aks handles not ready pods to not receives requests.
In AKS with azure cni enabled if I execute the same request on a not ready pod it is able to resolve the hostname and able to send request into the pod. However, there's one caveat is that when I try to execute a request through external private ip of that service that request doesn't reach the not ready pod which that is to be expected and seems to work right. But again when I try to execute a request like I mentioned above curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 that works but it shouldn't. Why does DNS in the case of azure cni have that value?
Is there anything I can do to configure azure cni to behave more like a default behavior of AKS where a curl request like that either will not resolve that hostname or will refuse the connection or something?
Assuming that not ready pod refer to pods with Readiness Probe failing. The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers. [Reference]
However, the logic determining the readiness of the pod might or might not have anything to do with whether the pod can serve requests and depends completely on the user.
For instance with a Pod having the following manifest:
apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness-pod
spec:
containers:
- name: readiness-container
image: nginx
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
readiness is decided based on the existence of the file /tmp/healthy irrespective of whether nginx serves the application. So on running the application and exposing it using a service readiness-svc on k run -:
kubectl exec readiness-pod -- /bin/bash -c 'if [ -f /tmp/healthy ]; then echo "/tmp/healthy file is present";else echo "/tmp/healthy file is absent";fi'
/tmp/healthy file is absent
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
readiness-pod 0/1 Running 0 11m 10.240.0.28 aks-nodepool1-29819654-vmss000000 <none> <none>
source-pod 1/1 Running 0 6h8m 10.240.0.27 aks-nodepool1-29819654-vmss000000 <none> <none>
kubectl describe svc readiness-svc
Name: readiness-svc
Namespace: default
Labels: test=readiness
Annotations: <none>
Selector: test=readiness
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.0.23.194
IPs: 10.0.23.194
Port: <unset> 80/TCP
TargetPort: 80/TCP
Endpoints:
Session Affinity: None
Events: <none>
kubectl exec -it source-pod -- bash
root#source-pod:/# curl -I readiness-svc.default.svc.cluster.local:80
curl: (7) Failed to connect to readiness-svc.default.svc.cluster.local port 80: Connection refused
root#source-pod:/# curl -I 10-240-0-28.default.pod.cluster.local:80
HTTP/1.1 200 OK
Server: nginx/1.21.3
Date: Mon, 13 Sep 2021 14:50:17 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 07 Sep 2021 15:21:03 GMT
Connection: keep-alive
ETag: "6137835f-267"
Accept-Ranges: bytes
Thus, we can see that when we try to connect from source-pod to the service readiness-svc.default.svc.cluster.local on port 80, connection is refused. This is because the kubelet did not find the /tmp/healthy file in the readiness-pod container to perform a cat operation, consequently marking the Pod readiness-pod not ready to serve traffic and removing it from the backend of the Service readiness-svc. However, the nginx server on the pod can still serve a web application and it will continue to do so if you connect directly to the pod.
Readiness probe failures of containers do not remove the DNS records of Pods. The DNS records of a Pod shares its lifespan with the Pod itself.
This behavior is characteristic of Kubernetes and does not change with network plugins. We have attempted to reproduce the issue and have observed same behavior with AKS clusters using kubenet and Azure CNI network plugins.

How to set hazelcast tcp strategy?

We are running services on a cluster as docker containers.
Since we cannot use eureka or multicast, we are trying to use hazelcast TCP discovery.Currently, configuration is like this (example):
cluster:
enabled: true
hazelcast:
useSiteLocalInterfaces: true
discovery:
tcp:
enabled: true
members:
- 10.10.10.1
- 10.10.10.2
- 10.10.10.3
azure:
enabled: false
multicast:
enabled: false
kubernetesDns:
enabled: false
During service start, we get the following log message:
Members configured for TCP Hazelcast Discovery after removing local addresses: [10.10.10.1, 10.10.10.2, 10.10.10.3]
That means, the service didn't discover its local ip right.
Later in the log, the following message appears: [LOCAL] [hazelcast-test-service:hz-profile] [3.12.2] Picked [172.10.0.1]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Obviously, the services determines its local ip to be 172.10.0.1. We have no idea, where this ip comes from. It doesn't exist on the cluster.
Is there a way to give hazelcast a hint how to discover its local ip?
The address 172.10.0.1 must be one of the network interfaces inside your container. You can ssh into your Docker container and check the network interfaces (e.g. with ifconfig).
If you want to use another network interface, you can configure it with the environment variable HZ_NETWORK_PUBLICADDRESS. For example, in your case, one of the members can be started with the following docker command.
docker run -e HZ_NETWORK_PUBLICADDRESS=10.10.10.1:5701 -p 5701:5701 hazelcast/hazelcast
Please read more at Hazelcast Docker Image: Hazelcast Hello World

TCP whitelist for custom TCP port does not work in haproxy ingress

Hi I was able to configure the haproxy ingress for a custom TCP port (RabbitMQ), using helm with custom values:
# ha-values.yaml
controller:
ingressClass: haproxy
config:
whitelist-source-range: 251.161.180.161
# use-proxy-protocol: "true"
# TCP service key:value pairs
# <port>: <namespace>/<servicename>:<portnumber>[:[<in-proxy>][:<out-proxy>]]
# https://haproxy-ingress.github.io/docs/configuration/command-line/#tcp-services-configmap
tcp:
15672: "default/rabbitmq-bugs:15672"
5672: "default/rabbitmq-bugs:5672"
Installed helm with
helm install haproxy-ingress haproxy-ingress/haproxy-ingress \
--create-namespace --namespace=ingress-controller \
--values yaml/ha-values.yaml
I published on Digital Ocean, so a LoadBalancer was started, and the port 15672 correctly forwaded to the internal rabbitmq kubernetes service.
I was not able to make the whitelist option works.
The service was always reachable.
I also try enabling proxy protocol on both load balancer and haproxy, but still the whitelist didn't take place.
Seems like the whitelist option doesn't work for TCP filtering.
Has anyone succeded in make a custom TCP port whitelisting?
Thanks.

Unable to enter a kubernetes pod. Error from server: error dialing backend: dial tcp: lookup (node hostname) on 168.63.129.16:53: no such host

We have deployed a K8S cluster using ACS engine in an Azure public cloud.
We are able to create deployments and services but when we enter a pod using "kubectl exec -ti (pod name) (command)" we are receiving the below error,
Error from server: error dialing backend: dial tcp: lookup (node hostname) on 168.63.129.16:53: no such host
I looked all over the internet and performed all I could to fix this issue but no luck so far.
The OS is Ubuntu and 168.63.129.16 is a public IP from Azure used for DNS.(refer below link)
https://blogs.msdn.microsoft.com/mast/2015/05/18/what-is-the-ip-address-168-63-129-16/
I've already added host entries to /etc/hosts and entries into resolv.conf of the master/node server and nslookup resolves the same. I've also tested by adding --resolv-conf flag to the kubelet but still it fails. I'm hoping that someone from this community can help us fix this issue.
Verify the node on which your pod is running can be resolved and reached from inside the API server container. If you added entries to /etc/resolv.conf on the master node verify they are visible in the APIserver container, if they are not, restarting the API server pod might be helpful
The problem was in VirtualBox layer
sudo ifconfig vboxnet0 up
Solution is taken from here https://github.com/kubernetes/minikube/issues/1224#issuecomment-316411907

Service IP is not accessible across nodes in kubernetes

I have created a kubernetes v1.2 running in Azure cloud with one master(Master) and two nodes(Node1 and Node2). I have deployed an Nginx and Tomcat application. Both the containers are deployed in individual pods with RC and they have a SERVICE for each.
Nginx pod is deployed in the Node1 and Tomcat pod is deployed in Node2. Now Nginx from Node1 is trying to access Tomcat via tomcat's ServiceIP(clusterIP) which is in Node2. But its unreachable.
Nginx serviceIP: 10.16.0.2 Node1
Tomcat serviceIP: 10.16.0.4 Node2
I tried curl 10.16.0.4:8080 from Node2, it works. But same from Node1 fails with curl: (52) Empty reply from server
So communication to serviceIP across nodes fails. Is this the problem with kube v1.2?
Note: ClusterIP for the Service will be specified at the time of creating the service.
Since you are able to reach the cluster ip from the Node2, it looks like the service selector is properly defined.
Kube-proxy is the component that watches the services and creates iptable rules for end points. I would check if kube-proxy is running properly on Node1. Then check if iptable rules are set properly for the cluster ip you are trying to reach.
You can see these with iptables -L -t nat | grep namespace/servicename
Here is an example:
bash-4.3# iptables -L -t nat | grep kube-system/heapster
KUBE-MARK-MASQ all -- 172.168.16.182 anywhere /* kube-system/heapster: */
DNAT tcp -- anywhere anywhere /* kube-system/heapster: */ tcp to:172.168.16.182:8082
KUBE-SVC-BJM46V3U5RZHCFRZ tcp -- anywhere 192.168.172.66 /* kube-system/heapster: cluster IP */ tcp dpt:http
KUBE-SEP-KNJP5BBKUOCH7NDB all -- anywhere anywhere /* kube-system/heapster: */
In this example I looked up heapster running in kube-system namespace. It is showing that the cluster ip is 192.168.172.66 DNATs to the endpoint 172.168.16.182, which is pods ip (You should cross check this with the endpoints listed in kubectl describe service.
If is it not there, restarting kube-proxy might help.

Resources