AKS service response is slow after idle time - azure

I set up my Azure Kubernetes Service deployment exactly as it's described in the official tutorial with static IP, Ingress Controller and Let's Encrypt SSL certs. I use:
Standard_B2s node size
1-node pool
Single pod
AKS Network type: Basic (Kubenet)
Let's Encrypt certs for prod
Response latency sometimes is very very high. One browser can very quickly get to the service (open swagger for instance) but then I go to my mobile phone and try to open it from it. It takes forever and often times out. As soon as you open the page once, next time it would most likely open very fast and without any delays. Why does it happen?
I can guarantee, it's not my service issue.

Related

Unhealty backend after scaling up App service plan

I have an application gateway running with a web application in a App service plan. The application gateway listens and passes requests to the backend, which is the web app. There is a health probe implemented that works fine.
The web app was reachable fine until I scaled up the Service plan. Suddenly the health probe timed out reaching the backend and I got a 502 bad gateway error in the browser trying to reach the web application. After hours the website suddenly was back and the backend was healthy again. I was under the impression that you could scale up and down the App plan without any noticeable effect on the website, but it seems the gateway was not playing along.
Did I configure something wrong or should this work like I assumed?
I tried to reproduce the same in environment create app service running with application gateway and got a 502 error.
The number of TCP connections allowed by the plan standard while is an older it contains the double make sure while scaling up and down in app service try to remain in same tier so that inbound IP will wait for sometimes and then scale back.
Try to update your default setting in configuration ->General setting-> ARR Affinity Off. Either your application isn't stateful, or the session state is kept on a distant service like a cache or database. And try to Run your application with a minimum of 2-3 instances to prevent from failure.
You can make use of app service diagnostics gives you the right information to more easily
For Reference:
Get started with autoscale in Azure - Azure Monitor| Microsoft
Guide to Running Healthy Apps - Azure App Service
And I got the same error in application gateway as well to avoid the issue
In your virtual network -> service endpoint -> Add endpoint Microsoft.web in default subnet
.

Kubernetes Pod - read ECONNRESET when requesting external web services

I have a bare-metal Kubernetes cluster setup running on separate lxc containers as nodes on Ubuntu 20.04 . It has Istio service mesh configured and approx 20 application services running on it (ServiceEntries for Istio are created for external services to be reached). I use MetalLB for the gateway's external IP provisioning.
I have an issue with pods making requests outside the cluster (egress), specifically reaching some of the external web services such as Cloudflare API or Sendgrid API to make some REST API calls. The DNS is working fine as the hosts I try to reach are indeed reachable from the pods (containers). It happens only that the first time pod is successful at making requests outside to the internet and after that random read ECONNRESET error happens when it tries to make REST API calls and sometimes even connect ETIMEDOUT but not frequent as the first error. Making the network requests from the nodes themselves to the internet seem to have no problems at all. Pods communicate with each other through the k8s' services fine without any of the problems.
I would guess something is not configured correctly and that the packets are not properly delivered back to the pod but I can't find any relevant help on the internet and I am a little bit lost on this one, I appreciate and I am very grateful for any of your help! I will happily provide more details if needed.
Thank you all once again!

How can I diagnose a connection failure to my Load-balanced Service Fabric Cluster in Azure?

I'm taking my first foray into Azure Service Fabric using a cluster hosted in Azure. I've successfully deployed my cluster via ARM template, which includes the cluster manager resource, VMs for hosting Service Fabric, a Load Balancer, an IP Address and several storage accounts. I've successfully configured the certificate for the management interface and I've successfully written and deployed an application to my cluster. However, when I try to connect to my API via Postman (or even via browser, e.g. Chrome) the connection invariably times out and does not get a response. I've double checked all of my settings for the Load Balancer and traffic should be getting through since I've configured my load balancing rules using the same port for the front and back ends to use the same port for my API in Service Fabric. Can anyone provide me with some tips for how to troubleshoot this situation and find out where exactly the connection problem lies ?
To clarify, I've examined the documentation here, here and here
Have you tried logging in to one of your service fabric nodes via remote desktop and calling your API directly from the VM? I have found that if I can confirm it's working directly on a node, the issue likely lies within the LB or potentially an NSG.

Does an Azure Web App care if its instances are healthy/unhealthy?

If I deploy a web app (formerly known as an Azure WebSite) to an App Hosting Plan in Azure with a couple of instances (scale = 2) will the load balancer in front of the instances care if any of the instances is unhealthy?
I'm troubleshooting an issue that sometimes causes my site to return an http 503 ~50% of the time. My thinking here is that one of two of my instances has failed but the load balancer hasn't noticed.
If the load balancer does care, what does it look for? I can't find anyway to specify a ping url, for instance.
Note: this question has nothing to do with Traffic Manager.
Yes, Azure Web Apps monitors the health of the workers by making internal requests to it and verifying that they're healthy.
However, we don't check status codes that the web app returns to user requests (like 500, etc) since that could easily be an app specific issue rather than a problem with the machine.
So the answer you're looking for is: We continuously test whether or not the instances (VMs) are healthy and take them down if they're not. However, those tests do not rely on error codes the customer's site returns

Redundancy with self hosted ServiceStack 3.x service [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are running a self hosted AppService with ServiceStack 3.x
We would like to have a automatic failover mechanism on the clients if the current service running as master fails.
Clients at the moment are strong typed C# using the default SS JSONClient, but we will add web based clients (AngularJS) in the future.
Does anybody have an idea, how that could be done?
Server side redundancy & failover:
That's a very broad question. A ServiceStack self hosted application is no different to any other web-facing resource. So you can treat it like a website.
Website Uptime Monitoring Services:
You can monitor it with regular website monitoring tools. These tools could be as simple as an uptime monitoring site that simply pings your web service at regular intervals to determine if it up, and if not take an action, such as triggering a restart of your server, or simply send you an email to say it's not working.
Cloud Service Providers:
If you are using a cloud provider such as Amazon EC2, they provide CloudWatch services that can be configured to monitor the health of your host machine and the Service. In the event of failure, it could restart your instance, or spin up another instance. Other providers provide similar tools.
DNS Failover:
You can also consider DNS failover. Many DNS providers can monitor service uptime, and in the event of a failover their service will change the DNS route to point to another standby service. So the failover will be transparent to the client.
Load Balancers:
Another option is to put your service behind a load balancer and have multiple instances running your service. The likelihood of all the nodes behind the load balancer failing is usually low, unless there is some catastrophically wrong with your service design.
Watchdog Applications:
As you are using a self hosted application, you may consider making another application on your system that simply checks that your service application host is running, and if not restarts it. This will handle cases where an exception has caused you app to terminate unexpectedly - of course this is not a long term solution, you will need to fix the exception.
High Availability Proxies (HAProxy, NGINX etc):
If you are run your ServiceStack application using Mono on a Linux platform there are many High Availability solutions, including HAProxy or NGINX. If you run on a Windows Server, they provide failover mechanisms.
Considerations:
The right solution will depend on your environment, your project budget, how quickly you need the failover to resolve. The ultimate considerations should be where will the service failover to?
Will you have another server running your service, simply on standby - just in case?
Will you use the cloud to start up another instance on demand?
Will you try and recover the existing application server?
Resources:
There are lots of articles out there about failover of websites, as your web service use HTTP like a website, they will also apply here. You should research into High Availability.
Amazon AWS has a lot of solutions to help with failover. Their Route 53 service is very good in this area, as are their loadbalancers.
Client side failover:
Client side failover is rarely practical. In your clients you can ultimately only ever test for connectivity.
Connectivity Checking:
When connectivity to your service fails you'll get an exception. Upon getting the exception, the only solution would be to change the target service URL, and retry the request. But there are a number of problems with this:
It can be as expensive as server side failover, as you have to keep the failover service(s) online all the time for the just-in-case moments. Some server side solutions would allow you to start up a failover service on demand, thus reducing cost significantly.
All clients must be aware of the URL(s) to failover too. If you managed the failover at DNS i.e. server side then clients wouldn't have to worry about this complexity.
Your client can only see connectivity failures, there may not be an issue with the server, it may be their connectivity. Imagine the client wifi goes down for a few seconds while servicing your request to your primary service server. During that time the client gets the connectivity exception and you try to send the request to the failover secondary service server, at which point their wifi comes online. Now you have clients using both the primary and secondary service. So their network connectivity issues become your data consistency problems.
If you are planning web based clients, then you will have to setup CORS support on the server, and all clients will require compatible browsers, so they can change the target service URL. CORS requests have the disadvantages of having more overhead that regular requests, because the client has to send OPTIONS requests too.
Connectivity error detection in clients is rarely fast. Sometimes it can take in excess of 30 seconds before a client times out a request as having failed.
If your service API is public, then you rely on the end-user implementing the failover mechanism. You can't guarantee they will do so, or that they will do so correctly, or that they won't take advantage of knowing your other service URLs and send requests there instead. Besides it look very unprofessional.
You can't guarantee that the failover will work when needed. It's difficult to guarantee that for any system, even big companies have issues with failover. Server side failover solutions sometimes fail to work properly but it's even more true for client side solutions because you can test the failover solution ahead of time, under all the different client side environmental factors. Just because your implementation of failover in the client worked in your deployment, will it work in all deployments? The point of the failover solution after all is to minimise risk. The risk of server side failover not working is far less than client, because it's a smaller controllable environment, which you can test.
Summary:
So while my considerations may not be favourable of client side failover, if you were going to do it, it's a case of catching connectivity exceptions, and deciding how to handle them. You may want to wait a few seconds and retry your request to the primary server before immediately swapping to the secondary just in case it was an intermittent error.
So:
Catch the connectivity exception
Retry the request (maybe after a small delay)
Still failing, change the target host and retry
If that fails, it's probably a client connectivity issue.

Resources