OutBound TCP connections are at high spike in Azure - azure

TCP connections are getting exhausted. Unable to figure out what is the root cause of it. How to figure out common spike issues.
This is observed after migrating project from .net framework to core 2.0.
Application downloading blobs using WebClient which is same both in framework and core project.

The outbound TCP connections on the VM instance can be exhausted.
In App Service, limits are enforced for the maximum number of outbound connections that can be made for each VM instance. For more information, reference: Cross-VM numerical limits (https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#cross-vm-numerical-limits). You may scale up the App Service Plan as per your requirement.
These limits apply only for customers of Basic or higher plans; in other words, customers running on their own dedicated VMs. These limits are there to protect the entire VM even though one particular site may be with its limits described above.
The limits are different depending on the size of VM configured.
Limit name Description Small (A1) Medium (A2) Large (A3)
Connections Number of connections across entire VM 1920, 3968, 8064 respectively.
Ensure that your application is not trying to access a local address- Connection attempts to local addresses (e.g. localhost, 127.0.0.1) and the machine's own IP will fail, except if another process in the same sandbox has created a listening socket on the destination port.
Reference: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/ - it’s a 3rd party blog, be cautious with the steps.

Related

What's benefit to setup NLB in Web Server in VM host?

I have a Web server (Web01) setup in VM. Currently, I facing performance issue on the Web Server, The bottleneck is too many request, the web server is not enough process power to execute. So I have 2 options to resolve this problem.
Increase CPU and Memory
Setup Web02 in VM (Same VM Host of Web01) and build NLB.
I don't know above 2 options which is the best. Actually, I struggle option 2 that if I setup 2 web server's in same VM host, is the performance is bester than option 1?
I can share some thoughts with you on the pros and cons of NLB, but I can't directly help you make a choice.
Network load balancing has several potential advantages. By distributing network traffic among multiple servers or virtual machines, processing is faster than if all traffic flows through a single server. If demand decreases, servers can be taken offline, and the feature will balance traffic among the remaining hosts. NLB provides fault tolerance at the network layer, ensuring that connections are not directed to servers that are down. Network Load Balancing also enables organizations to rapidly scale server applications by adding hosts and then distributing the application's traffic among the new hosts.
But it still has some drawbacks. It cannot detect service interruption, only by IP address. If a particular server service fails, WNLB cannot detect the failure and will still route requests to that server. The current CPU load and RAM utilization of each server cannot be considered when distributing the client load.

Is Azure Traffic manager is reliable for failover? what are other problems I should be worried about?

I am planning to use Azure Traffic manager to do a failover of my app running on one Azure zone to Azure zone.
I need some suggestion, if that is the correct approach to do a failover ? We have seen issue with Azure that, most of the services in one region goes down for few hours. Although I understand that Azure traffic manager is not associated with the region. But is it possible that Azure traffic manager goes down or that traffic manager endpoint is not reachable although my backend webapp is reachable?
If I am planning to use Azure traffic manager, what are other problems I should be worried about ?
I've been working with TM for some time now, so here are a few issues I haven't seen mentioned before:
Keep-Alive
If your service allows Keep-Alive, then your DNS entry will be ignored as long as the connection remains open. I've seen some exceptionally odd behavior result from this, including users being stuck on a fallback page since they kept using the connection, causing it to remain open indefinitely. If you have access to IIS Manager, you can force Keep-Alive to be false.
Browser DNS Caching
Most browsers have their own DNS cache, and very few honor DNS Time To Live. In my experience Chrome is pretty responsive, with IE and Edge having significant delays if you need them to rollover quickly. I've heard that Opera is particularly bad.
Other DNS Caching
Even if you're not accessing your service through a browser, other components can have DNS caches, and some of them will allow you to manage the cache yourself. This can in theory even depend on ISP's DNS caching, though reports on the magnitude of this vary significantly.
Traffic Manager works at the DNS level, which itself is replicated. However, even then, you should still build in redundancy into your solution.
Take a look at the Azure Architecture Center under "Make all things redundant" and you will see a recommendation for Traffic Manager:
consider adding another traffic management solution as a failback. If
the Azure Traffic Manager service fails, change your CNAME records in
DNS to point to the other traffic management service.
The Traffic Manager internal architecture is resilient to the failure of any single Azure region. So, even if a region fails, Traffic Manager should stay up. That applies to all Traffic Manager components: control plane, endpoint monitoring, and DNS name servers.
Since Traffic Manager works at the DNS level, it doesn't have an 'endpoint' that proxies your traffic--it uses DNS to direct clients to the appropriate endpoint, and clients then connect to those endpoints directly. Thus, an unreachable endpoint is an application problem, not a Traffic Manager problem.
That said, if the Traffic Manager DNS name servers are down, you have a serious problem. You DNS resolution path will fail and your customers will be impacted. The only solution is to either accept the risk (small, but can never be zero) or have a plan in place to use another DNS system, either in parallel or failover. This is not a limitation of Traffic Manager; you could say the same about any DNS-based traffic management system.
The earlier answer from DornaDigital is very good (other than the first point which suggests DNS caching will protect you through a name server outage--it won't). It covers some important points. In short, DNS-based failover works well for new sessions. Existing clients may have to refresh or even close their browser and reconnect.
I also agree with the details provided dornadigital.
There are considerations for front end applications as well. The browsers all have different thresholds for how long they maintain persistent connections. Chromium, for example, currently maintains a connection unless there is inactivity for 300 seconds.
In our web applications, we are detecting the failover by the presence of a certain number of failed requests to the endpoint. After requests begin failing, we pause requests for 301 seconds to allow the connection to reset. This allows the DNS change from the traffic manager to be applied to subsequent requests. We pop up a snackbar to indicate to the user that we are having an issue and display the count down when requests will resume. Similar to Gmail when it has an issue connecting to their servers.
I hope that gives you one idea on how to build some redundancy into your web apps.
I disagree with Jonathan as his understanding of the resiliency of the Traffic Manager service is in disagreement with Microsoft's own documentation on the subject.
When you provision Azure Traffic Manager, you select a region in which to deploy the service. I (correctly) inferred this to assert if said region were to fail, the Traffic Manager service could also be impacted and in turn, your application solution would not properly fail over to the secondary region.
According to Microsoft's Azure Application Architecture Guide, under "Make all things redundant", a customer should deploy Traffic Manager into more than one region:
Include redundancy for Traffic Manager. Traffic Manager is a possible failure point. Review the Traffic Manager SLA, and determine whether using Traffic Manager alone meets your business requirements for high availability. If not, consider adding another traffic management solution as a failback. If the Azure Traffic Manager service fails, change your CNAME records in DNS to point to the other traffic management service.
Azure Application Architecture Guide - Make all things redundant
My thought and intention is to not deploy Traffic Manager within the primary service region, but instead to deploy it into the secondary (failover region) and a tertiary (3rd) region as a backup.

Azure Website doesnt detect Traffic Manager Change

I have an Azure website (website.mycompany.com) that uses a WCF service for some data. The WCF Service sits behind an Azure Traffic Manager (service.mycompany.com) running in "priority mode", with 2 instances of the service for failover handling. With priority mode, the primary always serves up the data first, unless it's unavailable. If unavailable, the 2nd instance will reply.. and so on down the line.
We've had a few instances recently where the primary endpoint for service.mycompany.com was offline. For "partnerships" who point to service.mycompany.com, they detected the switch and all was fine. Lately however, our own site (website.mycompany.com) does NOT detect the traffic manager switch, and the website has errors since the service fails to reply.
Our failover endpoint in these instances is up, and in the past the Azure website detected the switch, it's only recently we've encountered this issue. Has anyone experienced similar issues? Are there perhaps any DNS changes that we need to tweak in our Azure Website to help it detect TTL's?
Has anyone experienced similar issues?
Do you mean the traffic manager can't switch to another endpoint immediately?
Traffic manager works at the DNS level, here are the reasons why traffic manager can't switch immediately:
The duration of the cache is determined by the 'time-to-live' (TTL) property of each DNS record. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint.
The traffic manager endpoint monitor effects the response time. More information about how azure traffic manager works, please refer to the link.
The following timeline is a detailed description of the monitoring process.
Also we can check traffic manager profile using nslookup and ipconfig in windows. About how to vertify traffic Manager settings, please refer to the link.
By the way, because traffic manager works at the DNS level, it cannot influence existing connections to any endpoint. When it directs traffic between endpoints (either by changed profile settings, or during failover or failback), Traffic Manager directs new connections to available endpoints. However, other endpoints might continue to receive traffic via existing connections until those sessions are terminated. To enable traffic to drain from existing connections, applications should limit the session duration used with each endpoint.
I'm going to refer you to my answer here because while the situation isn't exactly the same, it seems like it could have the same solution. To summarize, I find it likely that you have a connection left open to the down service that isn't being properly closed. This connection is independent of TTL, which only deals with DNS caching, and as such bypasses Traffic Manager completely.

C# Bandwidth Throttling w/Azure

I wrote a small utility that utilizes Azure blob storage to push some files across for a secondary backup (~100GB). Thus far it works really well, however since it is sitting in a colocation area, my bandwidth usage can hit 190mb/s+ which is a bill I'd rather not pay. Given this, I have two questions:
Outbound traffic on a server with multiple IPs utilizes the first IP configured as the "main" one. I know in C# I can get a list of network adapters and change properties, but is it possible to tell an app that it's traffic needs to utilize a specific IP (instead of the default) for outgoing connections? We could use this to filter anything coming out of that IP, regardless of destination and only this app would use that address.
If not, is it possible to configure an app to send all traffic on a separate adapter that would have a single IP, so we could filter outbound at our router level to throttle that traffic?
Alternatively (if we're attacking this from the wrong angle), is it possible to limit Azure transfers to a maximum bandwidth allotment in some capacity? That's all I'm really after, as any other traffic should be able to use the maximum it can (meaning QoS doesn't apply - there isn't contention here, just too much outgoing in general).
For your backup needs, did you already evaluate RA-GRS, it provides built-in data replication to secondary location with read-only access on the data.
https://azure.microsoft.com/en-us/documentation/articles/storage-redundancy/
As far as I can tell there is no API allows you setup a limits for the bandwidth consumed, however you can enable storage monitoring so that you have a better idea on how many transactions triggered.
https://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/
Btw, there is one thing which might be able to address your cost concern is to setup your spending limit for your Azure subscription, but this depends the type of your subscription.
https://azure.microsoft.com/en-us/pricing/spending-limits/

Is Azure limiting outgoing connections

I'm running a VM in Azure on which I have a service that makes a lot of outgoing http client calls. After a while (approx 10 minutes) when the service has made around 5000-10000 calls it suddenly starts to get Connection Refused as reponse to the requests.
When running the same service locally (tried in many environments and computers) it runs without any error. We are using the HttpClient class for the request.
The requests are done in 3 tasks running concurrently.
Is there some limits on the amount of outgoing connections in Azure that I should be aware of?
There is a maximum connection limit per azure subscription.
You should reuse the connections instead of opening new ones.
Read more about it here: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/
We have hit similar issues in the past, and looks like the VMs have an outbound connection limit of 1024 to an external IP. Internal Azure IPs, when they are in the same data center won't have this limitation since internal routing tables are able to handle those connections.
There is a lot of relevant information here:
https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#snatexhaust
Summarizing key points:
Try assigning a public IP to your VM if it doesn't have one. This is only viable if you are running a handful of VMs.
Try adding multiple IPs to your VNET/Load balancer if you are running behind one. Each external IP will multiply your connection limit.
Try optimizing your connection usage, i.e. keep connections alive for longer for efficient pipe-lining.
If you are using Linux VM, execute the below command to check the limit on open files/sockets
Ulimit - d cmd will give the value. The default is 1024
You can permanently change this value by appending the following in your limits.conf file
*soft nofile 65536
*hard nofile 65536
Beware of Azure DNS throttling.
DNS query traffic is throttled for each VM. Throttling shouldn't impact most applications. If request throttling is observed, ensure that client-side caching is enabled. For more information, see DNS client configuration.
Source: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances
I know its late but may be helpful for others.
Yes Azure has outbound connection limits as per subscriptions.
Solution:
Do not use multiple http client instances use single instance per application.
Reference link for connection limits is Here
Example:
How to use single instance example C# from Here
Here is the Azure page for service limits. It does not specify a call per time frame but does set max network connections for TCP as 500K and below that table there are settings for "Web Apps (Websites) Limits" that you may be reaching.
There is a limit of 500K TCP connections on a VM or web role (behind the scenes a web role sits on a VM as well). You can refer to the link below for Azure limits
Looks like your application is heavy on making outbound requests. In such a scenario, you might want to decouple this piece and use 'Azure functions' Azure Functions which gives you a serverless architecture capability.
Without knowing Azure at all, I wonder if the problem is that your VM has a limit on the number of TCP sockets - all of those (closed) TCP connections in FIN-WAIT state might have exhausted some limit set for Azure that isn't set in other circumstances. This is pure speculation.

Resources