not able to access statefulset headless service from kubernetes - cassandra

I have created a headless statefull service in kubernates. and cassandra db is running fine.
PS C:\> .\kubectl.exe get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cassandra None <none> 9042/TCP 50m
kubernetes 10.0.0.1 <none> 443/TCP 6d
PS C:\> .\kubectl.exe get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 49m
cassandra-1 1/1 Running 0 48m
cassandra-2 1/1 Running 0 48m
I am running all this on minikube. From my laptop i am trying to connect to 192.168.99.100:9402 using a java program. But it is not able to connect.

Looks like your service not defined with NodePort. can you change service type to NodePort and test it.
when we define svc to NodePort we should get two port number for the service.

Related

MetalLB works only in master Node, cant reach ip assigned from workers

I've sucessfully installed MetalLB on my Bare Metal Kubernetes cluster, but only pods assigned to the master Node seems to work.
MLB is configured on layer2, in the range of 192.168.0.100-192.168.0.200, and pods do get an IP when assigned to worker nodes, but those ips do not respond to any request.
If the assigned ip is curled inside the node, it works, yet if its curled from another node or machine, it doesnt respond.
Example:
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx2-658ffbbcb6-w5w28 1/1 Running 0 4m51s 10.244.1.2 worker2.homelab.com <none> <none>
nginx21-65b87bcbcb-fv856 1/1 Running 0 4h32m 10.244.0.10 master1.homelab.com <none> <none>
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h49m
nginx2 LoadBalancer 10.111.192.206 192.168.0.111 80:32404/TCP 5h21m
nginx21 LoadBalancer 10.108.222.125 192.168.0.113 80:31387/TCP 4h43m
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1.homelab.com Ready control-plane,master 5h50m v1.20.2 192.168.0.20 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
worker2.homelab.com Ready <none> 10m v1.20.2 192.168.0.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.15.2.el7.x86_64 docker://20.10.3
Deployment nginx2 (Worker2, the one that doest work)
kubectl describe svc nginx2
Name: nginx2
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx2
Type: LoadBalancer
IP: 10.111.192.206
LoadBalancer Ingress: 192.168.0.111
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 32404/TCP
Endpoints: 10.244.1.2:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 10m (x6 over 5h23m) metallb-speaker announcing from node "master1.homelab.com"
Normal nodeAssigned 5m18s metallb-speaker announcing from node "worker2.homelab.com"
[root#worker2 ~]# curl 192.168.0.111
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.111
curl: (7) Failed connect to 192.168.0.111:80; No route to host
Deployment nginx21 (Master1, the one that works)
kubectl describe svc nginx21
Name: nginx21
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=nginx21
Type: LoadBalancer
IP: 10.108.222.125
LoadBalancer Ingress: 192.168.0.113
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 31387/TCP
Endpoints: 10.244.0.10:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 12m (x3 over 4h35m) metallb-speaker announcing from node "master1.homelab.com"
[root#worker2 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
[root#master1 ~]# curl 192.168.0.113
<!DOCTYPE html> ..... (Works)
--------- PING WORKS FROM OTHER MACHINES ----------
I've just found out this, so it might be a problem with iptables? i dont really know how it works on MetalLB, i can ping the ip (192.168.0.111) from other machines and it responds
i figured out, after Matt response, it was the firewall that was blocking the access, so i just simply added the whole network to the port 80 and it worked.
[root#worker2 ~]# firewall-cmd --new-zone=kubernetes --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-source=192.168.0.1/16 --permanent
success
[root#worker2 ~]# firewall-cmd --zone=kubernetes --add-port=80/tcp --permanent
success
[root#worker2 ~]# firewall-cmd --reload

Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service

We have upgraded our Kubernates Service cluster on Azure to latest version 1.12.4. After that we suddenly recognize that pods and nodes cannot communicate between anymore by private ip :
kubectl get pods -o wide -n kube-system -l component=kube-proxy
NAME READY STATUS RESTARTS AGE IP NODE
kube-proxy-bfhbw 1/1 Running 2 16h 10.0.4.4 aks-agentpool-16086733-1
kube-proxy-d7fj9 1/1 Running 2 16h 10.0.4.35 aks-agentpool-16086733-0
kube-proxy-j24th 1/1 Running 2 16h 10.0.4.97 aks-agentpool-16086733-3
kube-proxy-x7ffx 1/1 Running 2 16h 10.0.4.128 aks-agentpool-16086733-4
As you see the node aks-agentpool-16086733-0 has private IP 10.0.4.35 . When we try to check logs on pods which are on this node we got such error:
Get
https://aks-agentpool-16086733-0:10250/containerLogs/emw-sit/nginx-sit-deploy-864b7d7588-bw966/nginx-sit?tailLines=5000&timestamps=true: dial tcp 10.0.4.35:10250: i/o timeout
We got the Tiller ( Helm) on this node as well, and if try to connect to tiller we got such error from Client PC:
shmits-imac:~ andris.shmits01$ helm version Client:
&version.Version{SemVer:"v2.12.3",
GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e",
GitTreeState:"clean"} Error: forwarding ports: error upgrading
connection: error dialing backend: dial tcp 10.0.4.35:10250: i/o
timeout
Does anybody have any idea why the pods and nodes lost connectivity by private IP ?
So , after we scaled down the cluster from 4 nodes to 2 nodes problem disappeared. And after we again scaled up from 2 nodes to 4 everything started working fine
issue could be with apiserver. did you check logs from apiserver pod?
can you run the below command inside cluster. do you 200 OK response?
curl -k -v https://10.96.0.1/version
These issues come when nodes in the Kubernetes cluster created using kubeadm do not get proper Internal IP addresses matching with Nodes/Machines IP.
Issue: If I run helm list command from my cluster then I get below error
helm list
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h10m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 179m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 167m v1.18.5 10.0.2.15 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
Please note: k-worker02 has internal IP as 10.0.2.15 but I was expecting 10.0.0.7 which is my node/machine IP.
Solution:
Step 1: Connect to Host ( here k-worker02) which does have expected IP
Step 2: open below file
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Step 3: Edit and append with --node-ip 10.0.0.7
code snippet
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --node-ip 10.0.0.7
Step 4: Reload the daemon and restart the kubelet service
sudo systemctl daemon-reload && sudo systemctl restart kubelet
Result:
kubectl get nodes -o wide
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k-master Ready master 3h36m v1.18.5 10.0.0.5 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker01 Ready <none> 3h25m v1.18.5 10.0.0.6 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
k-worker02 Ready <none> 3h13m v1.18.5 10.0.0.7 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.12
With the above solution, the k-worker02 node has got expected IP (10.0.07) and "forwarding ports:" error stops coming from "helm list or helm install commnad".
Reference: https://networkinferno.net/trouble-with-the-kubernetes-node-ip

How to configure pod disruption budget to drain kubernetes node?

I'd like to configure cluster autoscaler on AKS. When scaling down it fails due to PDB:
I1207 14:24:09.523313 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-0 cannot be removed: no enough pod disruption budget to move kube-system/metrics-server-5cbc77f79f-44f9w
I1207 14:24:09.523413 1 cluster.go:95] Fast evaluation: node aks-nodepool1-32797235-3 cannot be removed: non-daemonset, non-mirrored, non-pdb-assignedkube-system pod present: cluster-autoscaler-84984799fd-22j42
I1207 14:24:09.523438 1 scale_down.go:490] 2 nodes found to be unremovable in simulation, will re-check them at 2018-12-07 14:29:09.231201368 +0000 UTC m=+8976.856144807
All system pods have minAvailable: 1 PDB assigned manually. I can imagine that this is not working for PODs with only a single replica like the metrics-server:
❯ k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nodepool1-32797235-0 Ready agent 4h v1.11.4 10.240.0.4 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
aks-nodepool1-32797235-3 Ready agent 4h v1.11.4 10.240.0.6 <none> Ubuntu 16.04.5 LTS 4.15.0-1030-azure docker://3.0.1
❯ ks get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cluster-autoscaler-84984799fd-22j42 1/1 Running 0 2h 10.244.1.5 aks-nodepool1-32797235-3 <none>
heapster-5d6f9b846c-g7qb8 2/2 Running 0 1h 10.244.0.16 aks-nodepool1-32797235-0 <none>
kube-dns-v20-598f8b78ff-8pshc 4/4 Running 0 3h 10.244.1.4 aks-nodepool1-32797235-3 <none>
kube-dns-v20-598f8b78ff-plfv8 4/4 Running 0 1h 10.244.0.15 aks-nodepool1-32797235-0 <none>
kube-proxy-fjvjv 1/1 Running 0 1h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kube-proxy-szr8z 1/1 Running 0 1h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-2rhvg 2/2 Running 0 4h 10.240.0.4 aks-nodepool1-32797235-0 <none>
kube-svc-redirect-r2m4r 2/2 Running 0 4h 10.240.0.6 aks-nodepool1-32797235-3 <none>
kubernetes-dashboard-68f468887f-c8p78 1/1 Running 0 4h 10.244.0.7 aks-nodepool1-32797235-0 <none>
metrics-server-5cbc77f79f-44f9w 1/1 Running 0 4h 10.244.0.3 aks-nodepool1-32797235-0 <none>
tiller-deploy-57f988f854-z9qln 1/1 Running 0 4h 10.244.0.8 aks-nodepool1-32797235-0 <none>
tunnelfront-7cf9d447f9-56g7k 1/1 Running 0 4h 10.244.0.2 aks-nodepool1-32797235-0 <none>
What needs be changed (number of replicas? PDB configuration?) for down-scaling to work?
Basically, this is an administration issues when draining nodes that are configured by PDB ( Pod Disruption Budget )
This is because the evictions are forced to respect the PDB you specify
you have two options:
Either force the hand:
kubectl drain foo --force --grace-period=0
you can check other options from the doc -> https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
or use the eviction api:
{
"apiVersion": "policy/v1beta1",
"kind": "Eviction",
"metadata": {
"name": "quux",
"namespace": "default"
}
}
Anyhow, the drain or the eviction api attempts delete on pod to let them be scheduled elswhere before completely draining the node
As mentioned in the docs:
the API can respond in one of three ways:
If the eviction is granted, then the pod is deleted just as if you had sent a DELETE request to the pod’s URL and you get back 200 OK.
If the current state of affairs wouldn’t allow an eviction by the rules set forth in the budget, you get back 429 Too Many Requests. This is typically used for generic rate limiting of any requests
If there is some kind of misconfiguration, like multiple budgets pointing at the same pod, you will get 500 Internal Server Error.
For a given eviction request, there are two cases:
There is no budget that matches this pod. In this case, the server always returns 200 OK.
There is at least one budget. In this case, any of the three above responses may apply.
If it gets stuck then you might need to do it manually
you can read me here or here

New AKS cluster unreachable via network (including dashboard)

Yesterday I spun up an Azure Kubernetes Service cluster running a few simple apps. Three of them have exposed public IPs that were reachable yesterday.
As of this morning I can't get the dashboard tunnel to work or the LoadBalancer IPs themselves.
I was asked by the Azure twitter account to solicit help here.
I don't know how to troubleshoot this apparent network issue - only az seems to be able to touch my cluster.
dashboard error log
❯❯❯ make dashboard ~/c/azure-k8s (master)
az aks browse --resource-group=akc-rg-cf --name=akc-237
Merged "akc-237" as current context in /var/folders/9r/wx8xx8ls43l8w8b14f6fns8w0000gn/T/tmppst_atlw
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out
service+pod listing
❯❯❯ kubectl get services,pods ~/c/azure-k8s (master)
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
azure-vote-back ClusterIP 10.0.125.49 <none> 6379/TCP 16h
azure-vote-front LoadBalancer 10.0.185.4 40.71.248.106 80:31211/TCP 16h
hubot LoadBalancer 10.0.20.218 40.121.215.233 80:31445/TCP 26m
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 19h
mti411-web LoadBalancer 10.0.162.209 52.168.123.30 80:30874/TCP 26m
NAME READY STATUS RESTARTS AGE
azure-vote-back-7556ff9578-sjjn5 1/1 Running 0 2h
azure-vote-front-5b8878fdcd-9lpzx 1/1 Running 0 16h
hubot-74f659b6b8-wctdz 1/1 Running 0 9s
mti411-web-6cc87d46c-g255d 1/1 Running 0 26m
mti411-web-6cc87d46c-lhjzp 1/1 Running 0 26m
http failures
❯❯❯ curl --connect-timeout 2 -I http://40.121.215.233 ~/c/azure-k8s (master)
curl: (28) Connection timed out after 2005 milliseconds
❯❯❯ curl --connect-timeout 2 -I http://52.168.123.30 ~/c/azure-k8s (master)
curl: (28) Connection timed out after 2001 milliseconds
If you are getting getsockopt: connection timed out while trying to access to your AKS Dashboard, I think deleting tunnelfront pod will help as once you delete the tunnelfront pod, this will trigger creation of new tunnelfront by Master. Its something I have tried and worked for me.
#daniel Did rebooting the agent VM's solve your issue or are you still seeing issues?

kube-dns stays in ContainerCreating status

I have 5 machines running Ubuntu 16.04.1 LTS. I want to set them up as a Kubernetes Cluster. Iḿ trying to follow this getting started guide where they're using kubeadm.
It all worked fine until step 3/4 Installing a pod network. I've looked at there addon page to look for a pod network and chose the flannel overlay network. Iǘe copied the yaml file to the machine and executed:
root#up01:/home/up# kubectl apply -f flannel.yml
Which resulted in:
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
So i thought that it went ok, but when I display all the pod stuff:
root#up01:/etc/kubernetes/manifests# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-d5f50 1/1 Running 0 50m
kube-system etcd-up01 1/1 Running 0 48m
kube-system kube-apiserver-up01 1/1 Running 0 50m
kube-system kube-controller-manager-up01 1/1 Running 0 49m
kube-system kube-discovery-1769846148-jvx53 1/1 Running 0 50m
kube-system kube-dns-2924299975-prlgf 0/4 ContainerCreating 0 49m
kube-system kube-flannel-ds-jb1df 2/2 Running 0 32m
kube-system kube-proxy-rtcht 1/1 Running 0 49m
kube-system kube-scheduler-up01 1/1 Running 0 49m
The problem is that the kube-dns keeps in the ContainerCreating state. I don't know what to do.
It is very likely that you missed this critical piece of information from the guide:
If you want to use flannel as the pod network, specify
--pod-network-cidr 10.244.0.0/16 if you’re using the daemonset manifest below.
If you omit this kube-dns will never leave the ContainerCreating STATUS.
Your kubeadm init command should be:
# kubeadm init --pod-network-cidr 10.244.0.0/16
and not
# kubeadm init
Did you try restarting NetworkManager ...? it worked for me.. Plus, it also worked when I also disabled IPv6.

Resources