Kubernetes on Azure : connectex - azure

Followed steps from the link to create a K8s cluster using the Azure Portal. Tried using kubectl on a remote machine to check if it's working. Got this error.
Unable to connect to the server: dial tcp 13.90.35.157:443: connectex:
A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond.
I can SSH to the K8s master. Tried kubectl get nodes from the master and got similar error.

It is really hard to say from such a description what went wrong, but as this is a new cluster ( and I'm saying this because sometimes k8s cluster gets deployed but doesn't really work, so ), I would suggest deleting it and creating a new one and\or creating it using the Azure Cli\Azure Cloud Shell.
Basically its as simple as:
az acs create -n acs-cluster -g acsrg1 -d applink789 --generate-ssh-keys
if you have the resource group created, if not you can create it with:
az group create -n acsrg1 -l "westus"

According to your description, it seems you have not configured the Service Principal correctly. I use wrong service principal to deploy K8S in Azure, get the same error:
C:\Users>kubectl get nodes
Unable to connect to the server: dial tcp 13.90.27.73:443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
You may need to check to ensure the credentials were provided accurately, and that the configured Service Principal has read and write permissions to the target Subscription.
If your Service Principal is misconfigured, none of the kubernetes components will come up in a healthy manner. We can check to see if this the problem:
root#k8s-master-6FEE48E1-0:~# journalctl -u kubelet | grep --text autorest
If you see output that looks like the following, it means you have not configured the service Principal correctly.
root#k8s-master-6FEE48E1-0:~# journalctl -u kubelet | grep --text autorest
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.447321 6028 kubelet.go:1186] Cannot get Node info: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.627128 6028 kubelet_node_status.go:70] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
Jun 01 01:58:47 k8s-master-6FEE48E1-0 docker[5522]: E0601 01:58:47.885092 6028 kubelet_node_status.go:70] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: autorest#WithErrorUnlessStatusCode: POST https://login.microsoftonline.com/1fcf418e-66ed-4c99-9449-d8e18bf8737a/oauth2/token?api-version=1.0 failed with 400 Bad Request: StatusCode=400
More information about how to create /configure a service principal for ACS-Engin Kubernetes cluster, please refer to this link.

Related

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

Unable to push image to OpenShift internal registry with i/o timeout

Pushing image docker-registry.default.svc:5000/th/th:source ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount#example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: After retrying 6 times, Push image still failed due to error: Get https://docker-registry.default.svc:5000/v1/_ping: dial TCP<ip>:5000: i/o timeout
Manually pushing an image from the CLI to the internal registry is working fine.
I have deployed the OpenShift instance 3.11 on a couple of azure VMs, while deploying I took care of adding external IP to the same.
All other images are also present in the docker registry and the curl command to the docker registry returns with exit code 0
What seemed curious was while deploying my app I tried pinging the registry from the build pods terminal. This resulted in the connection being hung up and no response.
Any ideas on how to fix this?
The sdn was causing this networking issue.
Does Azure support Calico networking?
Calico in VXLAN mode is supported on Azure. However, IPIP packets are
blocked by the Azure network fabric.
The above quote from calico reference was the reason this issue was caused. This could be resolved by changing to VXLAN mode in calico config. More details on how to switch can be found here.
For my solution I just switched to the default openshift sdn 'ovs-subnet' from calico in the inventory file.

Kubernetes in Azure - Unable to connect to the server: proxyconnect tcp: dial tcp: lookup http: no such host

I'm new in Azure. I've install a Kubernetes Cluster v1.10.3 in Azure and followed the steps in order to operate kubernetes from console.
When I try to get any resource, for example the pods, by executing kubectl get pods, I'm getting back the following error:
Unable to connect to the server: proxyconnect tcp: dial tcp: lookup http: no such host
Equally, if I try to execute any other command by using AKS, for example next one az aks browse --resource-group POC_Service_Mesh --name ServiceMeshCluster, you get back the following error:
Unable to connect to the server: proxyconnect tcp: dial tcp: lookup http: no such host.
My region is WestEurope.
Finally, this issue occurs also with Kubernetes v.1.9.6. Because of that, I've updated to v.1.10.3.
Any suggestion? Thanks a lot! :-)
I've just solved the issue.
Finally it was a matter with the proxy, which was blocking all communications.
Thanks a lot anyway!

Azure aks no nodes found

I created an azure AKS with 3 nodes(Standard DS3 v2 (4 vcpus, 14 GB memory)). I was fiddling with the cluster and created a Deployment with 1000 replicas.After this complete cluster went down.
azureuser#saa:~$ k get cs
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: getsockopt: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused
etcd-0 Healthy {"health": "true"}
From debugging it seems both Scheduler and Controller-manager went down. How to Fix this?
What exactly happened when created a Deployment with 1000 replicas? Should it be taken care by k8s?
Few debugging commands output:
kubectl cluster-info
Kubernetes master is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443
Heapster is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
Logs for kubectl cluster-info dump # http://termbin.com/e6wb
azureuser#sim:~$ az aks scale -n cg -g cognitive-games -c 4 --verbose
Deployment failed. Correlation ID: 4df797b2-28bf-4c18-a26a-4e341xxxxx. Operation failed with status: 200. Details: Resource state Failed
no nodes displayed
azureuser#si:~$ k get nodes
No resources found
Looks silly but when AKS is created in an RG, surprisingly two RGs are created one with the AKS and another one with some random hash having all the VMS. I've deleted the 2nd RG and the basic AKS stopped working.

Network Security Group for Filezilla(client) connection

I am new here.
Few days ago, attended MS azure events, and today registered with Azure (free account).
VM Environment: VM = CentOS 7, apache+php+mysql+vsftpd+phpMyAdmin
everything is up and running, able to visit the "info.php" via its public IP address.
SeLinux = disabled, Firewalld disabled.
my problem is not able to connect this server via Filezilla (PC client).
from Windows command prompt (FTP/put) is working, able to upload files.
But via Filezilla
Status: Connecting to 5x.1xx.1xx.7x:21...
Status: Connection established, waiting for welcome message...
Status: Insecure server, it does not support FTP over TLS.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/home/ftpuser"
Command: TYPE I
Response: 200 Switching to Binary mode.
Command: PORT 192,168,1,183,234,99
Response: 200 PORT command successful. Consider using PASV.
Command: LIST
Error: Connection timed out after 20 seconds of inactivity
Error: Failed to retrieve directory listing
Status: Disconnected from server
Status: Connecting to 5x.1xx.1xx.7x:21...
Status: Connection established, waiting for welcome message...
Status: Insecure server, it does not support FTP over TLS.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/home/ftpuser"
Command: TYPE I
Response: 200 Switching to Binary mode.
Command: PORT 192,168,1,183,234,137
Response: 200 PORT command successful. Consider using PASV.
Command: LIST
Error: Connection timed out after 20 seconds of inactivity
Error: Failed to retrieve directory listing
I believe that is because of the Network Security group settings for inbound and outbound rules, need open some port, but not sure, because I tried open 1024-65535 all allow, still not working.
If you use passive mode FTP, you should open ports 20,21 and ports that you need on Azure NSG(Inbound rules). You could check /etc/vsftpd.conf
pasv_enable=YES
pasv_min_port=60001
pasv_max_port=60005
For this example, you should open ports 60001-60005 on Azure NSG(Inbound rules).

Resources