I followed the Quickstart docs (here) to deploy a k8s cluster in the Western Europe region. The cluster boots up fine, but I cannot connect to it using kubectl - kubectl times out while trying to perform a TLS handshake:
Unable to connect to the server: net/http: TLS handshake timeout
There is currently a github issue where others are reporting the same problem.
Following some advice on the thread, I attempted to perform an upgrade from 1.8.1 to 1.8.2, which failed:
bash-4.3# az aks upgrade --resource-group=k8s --name=phlo -k 1.8.2
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/n): y
/ Running ..
Deployment failed. Correlation ID: <redacted>. Operation failed with status: 200. Details: Resource state Failed
According to others on the github thread, it seems to be a region-specific issue.
The solution to this one for me was to scale the nodes in my Cluster from the Azure Kubernetes service blade web console.
Workaround / Solution
An interesting solution (worked for me!) to test...
Log into the Azure Console — Kubernetes Service blade.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Added this to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure AKS server?
I was able to get a working AKS setup after ignoring the Azure CLI response on when the k8s cluster was ready, and watching for the "creating..." bar in the AKS overview section of Azure Console to disappear.
There's some good comments here if you are still stuck: https://github.com/Azure/AKS/issues/112
For me, the issue disappeared after freeing some space on my Mac
and then start the proxy again with kubectl proxy
Related
I have a bare-metal Kubernetes cluster setup running on separate lxc containers as nodes on Ubuntu 20.04 . It has Istio service mesh configured and approx 20 application services running on it (ServiceEntries for Istio are created for external services to be reached). I use MetalLB for the gateway's external IP provisioning.
I have an issue with pods making requests outside the cluster (egress), specifically reaching some of the external web services such as Cloudflare API or Sendgrid API to make some REST API calls. The DNS is working fine as the hosts I try to reach are indeed reachable from the pods (containers). It happens only that the first time pod is successful at making requests outside to the internet and after that random read ECONNRESET error happens when it tries to make REST API calls and sometimes even connect ETIMEDOUT but not frequent as the first error. Making the network requests from the nodes themselves to the internet seem to have no problems at all. Pods communicate with each other through the k8s' services fine without any of the problems.
I would guess something is not configured correctly and that the packets are not properly delivered back to the pod but I can't find any relevant help on the internet and I am a little bit lost on this one, I appreciate and I am very grateful for any of your help! I will happily provide more details if needed.
Thank you all once again!
I have an application deployed in Kubernetes. I am using the Istio service mesh. One of my services needs to be restarted when a particular error occurs. Is this something that can be achieved using Istio?
I don't want to use a cronjob. Also, making the application restart itself seems like an anti-pattern.
The application is a node js app with fastify.
Istio is a network connection tool. I was creating this answer when David Maze made a very correct mention in a comment:
Istio is totally unrelated to this. Another approach could be to use a Kubernetes liveness probe if the cluster can detect the pod is unreachable; but if you're going to add a liveness hook to your code, the Kubernetes documentation also endorses just crashing on unrecoverable failure.
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
See also:
health checks on the cloud - GCP example
creating custom rediness/liveness probe
customization liveness probe
I set up my Azure Kubernetes Service deployment exactly as it's described in the official tutorial with static IP, Ingress Controller and Let's Encrypt SSL certs. I use:
Standard_B2s node size
1-node pool
Single pod
AKS Network type: Basic (Kubenet)
Let's Encrypt certs for prod
Response latency sometimes is very very high. One browser can very quickly get to the service (open swagger for instance) but then I go to my mobile phone and try to open it from it. It takes forever and often times out. As soon as you open the page once, next time it would most likely open very fast and without any delays. Why does it happen?
I can guarantee, it's not my service issue.
I'm taking my first foray into Azure Service Fabric using a cluster hosted in Azure. I've successfully deployed my cluster via ARM template, which includes the cluster manager resource, VMs for hosting Service Fabric, a Load Balancer, an IP Address and several storage accounts. I've successfully configured the certificate for the management interface and I've successfully written and deployed an application to my cluster. However, when I try to connect to my API via Postman (or even via browser, e.g. Chrome) the connection invariably times out and does not get a response. I've double checked all of my settings for the Load Balancer and traffic should be getting through since I've configured my load balancing rules using the same port for the front and back ends to use the same port for my API in Service Fabric. Can anyone provide me with some tips for how to troubleshoot this situation and find out where exactly the connection problem lies ?
To clarify, I've examined the documentation here, here and here
Have you tried logging in to one of your service fabric nodes via remote desktop and calling your API directly from the VM? I have found that if I can confirm it's working directly on a node, the issue likely lies within the LB or potentially an NSG.
The error I am getting after running kubectl cluster-info
Kubernetes master is running at https://xxx-xxx-aks-yyyy.hcp.westeurope.azmk8s.io:443
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Unable to connect to the server: net/http: TLS handshake timeout
Rifats-MacBook-Pro:~ rifaterdemsahin$ kubectl cluster-info
There is some issue on Azure Server. Please refer to this similar issue.
I suggest you could try again, if it still does not work, you could open a ticket or give feedback to Azure.
The solution to this one for me was to scale the nodes in my Cluster from the Azure Kubernetes service blade web console (NOT your local terminal since you can't connect with kubectl).
After scaling up by one (and back down to save costs) I am back working again.
Workaround / Solution
Log into the Azure Console — Kubernetes Service blade.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Added this to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure AKS server?