I'm running a VM in Azure on which I have a service that makes a lot of outgoing http client calls. After a while (approx 10 minutes) when the service has made around 5000-10000 calls it suddenly starts to get Connection Refused as reponse to the requests.
When running the same service locally (tried in many environments and computers) it runs without any error. We are using the HttpClient class for the request.
The requests are done in 3 tasks running concurrently.
Is there some limits on the amount of outgoing connections in Azure that I should be aware of?
There is a maximum connection limit per azure subscription.
You should reuse the connections instead of opening new ones.
Read more about it here: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/
We have hit similar issues in the past, and looks like the VMs have an outbound connection limit of 1024 to an external IP. Internal Azure IPs, when they are in the same data center won't have this limitation since internal routing tables are able to handle those connections.
There is a lot of relevant information here:
https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#snatexhaust
Summarizing key points:
Try assigning a public IP to your VM if it doesn't have one. This is only viable if you are running a handful of VMs.
Try adding multiple IPs to your VNET/Load balancer if you are running behind one. Each external IP will multiply your connection limit.
Try optimizing your connection usage, i.e. keep connections alive for longer for efficient pipe-lining.
If you are using Linux VM, execute the below command to check the limit on open files/sockets
Ulimit - d cmd will give the value. The default is 1024
You can permanently change this value by appending the following in your limits.conf file
*soft nofile 65536
*hard nofile 65536
Beware of Azure DNS throttling.
DNS query traffic is throttled for each VM. Throttling shouldn't impact most applications. If request throttling is observed, ensure that client-side caching is enabled. For more information, see DNS client configuration.
Source: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances
I know its late but may be helpful for others.
Yes Azure has outbound connection limits as per subscriptions.
Solution:
Do not use multiple http client instances use single instance per application.
Reference link for connection limits is Here
Example:
How to use single instance example C# from Here
Here is the Azure page for service limits. It does not specify a call per time frame but does set max network connections for TCP as 500K and below that table there are settings for "Web Apps (Websites) Limits" that you may be reaching.
There is a limit of 500K TCP connections on a VM or web role (behind the scenes a web role sits on a VM as well). You can refer to the link below for Azure limits
Looks like your application is heavy on making outbound requests. In such a scenario, you might want to decouple this piece and use 'Azure functions' Azure Functions which gives you a serverless architecture capability.
Without knowing Azure at all, I wonder if the problem is that your VM has a limit on the number of TCP sockets - all of those (closed) TCP connections in FIN-WAIT state might have exhausted some limit set for Azure that isn't set in other circumstances. This is pure speculation.
Related
We are having difficulties choosing a load balancing solution (Load Balancer, Application Gateway, Traffic Manager, Front Door) for IIS websites on Azure VMs. The simple use case when there are 2 identical sites is covered well – just use Azure Load Balancer or Application Gateway. However, in cases when we would like to update websites and test those updates, we encounter limitation of load balancing solutions.
For example, if we would like to update IIS websites on VM1 and test those updates, the strategy would be:
Point a load balancer to VM2.
Update IIS website on VM1
Test the changes
If all tests are passed then point the load balancer to VM1 only, while we update VM2.
Point the load balancer to both VMs
We would like to know what is the best solution for directing traffic to only one VM. So far, we only see one option – removing a VM from backend address pool then returning it back and repeating the process for other VMs. Surely, there must be a better way to direct 100% of traffic to only one (or to specific VMs), right?
Update:
We ended up blocking the connection between VMs and Load Balancer by creating Network Security Group rule with Deny action on Service Tag Load Balancer. Once we want that particular VM to be accessible again we switch the NSG rule from Deny to Allow.
The downside of this approach is that it takes 1-3 minutes for the changes to take an effect. Continuous Delivery with Azure Load Balancer
If anybody can think of a faster (or instantaneous) solution for this, please let me know.
Without any Azure specifics, the usual pattern is to point a load balancer to a /status endpoint of your process, and to design the endpoint behavior according to your needs, eg:
When a service is first deployed its status is 'pending"
When you deem it healthy, eg all tests pass, do a POST /status to update it
The service then returns status 'ok'
Meanwhile the load balancer polls the /status endpoint every minute and knows to mark down / exclude forwarding for any servers not in the 'ok' state.
Some load balancers / gateways may work best with HTTP status codes whereas others may be able to read response text from the status endpoint. Pretty much all of them will support this general behavior though - you should not need an expensive solution.
We ended up blocking connection between VMs and Load Balancer by creating Network Security Group rule with Deny action on Service Tag Load Balancer. Once we want that particular VM to be accessible again we switch the NSG rule from Deny to Allow.
The downside of this approach is that it takes 1-3 minutes for the changes to take an effect. Continuous Delivery with Azure Load Balancer
If anybody can think of a faster (or instantaneous) solution for this, please let me know.
I had exactly the same requirement in an Azure environment which I built a few years ago. Azure Front Door didn't exist, and I had looked into using the Azure API to automate the process of adding and removing backend servers the way you described. It worked sometimes, but I found the Azure API was unreliable (lots of 503s reconfiguring the load balancer) and very slow to divert traffic to/from servers as I added or removed them from my cluster.
The solution that follows probably won't be well received if you are looking for an answer which purely relies upon Azure resources, but this is what I devised:
I configured an Azure load balancer with the simplest possible HTTP and HTTPS round-robin load balancing of requests on my external IP to two small Azure VMs running Debian with HAProxy. I then configured each HAProxy VM with backends for the actual IIS servers. I configured the two HAProxy VMs in an availability set such that Microsoft should not ever reboot them simultaneously for maintenance.
HAProxy is an excellent and very robust load balancer, and it supports nearly every imaginable load balancing scenario, and crucially for your question, it also supports listening on a socket to control the status of the backends. I configured the following in the global section of my haproxy.cfg:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats socket ipv4#192.168.95.100:9001 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
In my example, 192.168.95.100 is the first HAProxy VM, and 192.168.95.101 is the second. On the second server, these lines would be identical except for its internal IP.
Let's say you have an HAProxy frontend and backend for your HTTPS traffic to two web servers, ws1pro and ws2pro with the IPs 192.168.95.10 and 192.168.95.11 respectively. For simplicity sake, I'll assume we don't need to worry about HTTP session state differences across the two servers (e.g. Out-of-Process session state) so we just divert HTTPS connections to one node or the other:
listen stats
bind *:8080
mode http
stats enable
stats refresh 10s
stats show-desc Load Balancer
stats show-legends
stats uri /
frontend www_https
bind *:443
mode tcp
option tcplog
default_backend backend_https
backend backend_https
mode tcp
balance roundrobin
server ws1pro 192.168.95.10:443 check inter 5s
server ws2pro 192.168.95.11:443 check inter 5s
With the configuration above, since both HAProxy VMs are listening for admin commands on port 9001, and the Azure load balancer is sending the client's requests to either VM, we need to tell both servers to disable the same backend.
I used Socat to send the cluster control commands. You could do this from a Linux VM, but there is also a Windows version of Socat, and I used the Windows version in a set of really simple batch files. The cluster control commands would actually be the same in BASH.
stop_ws1pro.bat:
echo disable server backend_https/ws1pro | socat - TCP4:192.168.95.100:9001
echo disable server backend_https/ws1pro | socat - TCP4:192.168.95.101:9001
start_ws1pro.bat:
echo enable server backend_https/ws1pro | socat - TCP4:192.168.95.100:9001
echo enable server backend_https/ws1pro | socat - TCP4:192.168.95.101:9001
These admin commands execute almost instantly. Since the HAProxy configuration above enables the stats page, you should be able to watch the status change happen on the stats page as soon as it refreshes. The stats page will show the connections or sessions draining from the server you disabled over to the remaining enabled servers when you disable a backend, and then show them returning to the server once it is enabled again.
I am trying to find a powershell command which helps find out a way that there is no open connections or any traffic is flowing to endpoint1 or confirm traffic is moving smoothly to endpoint2 after disabling endpoint1:
$e[0].EndpointStatus = "Disabled"
Set-AzureRmTrafficManagerEndpoint -TrafficManagerEndpoint $e
Is there a command to do this? I am not able to find anything in google or should I use some wait command to wait for like a minute to flush out all open connections?
*Basically looking for a way to make sure all in-flight connections are drained from one endpoint before disabling it.
Traffic does not flow through your Traffic Manager instance. Therefore, the functionality you are asking for from Traffic Manager does not exist. Traffic Manager simply resolves DNS queries to an IP address of one of your endpoints using the routing method (priority, weighted, performance, etc) you configured it for.
After disabling an endpoint, you could still see traffic going to the disabled endpoint for a period of time measured by your Traffic Manager profile DNS TTL setting. For example, if you disable an endpoint at 3:01:00 and your DNS TTL setting is 90 seconds, then you could see traffic until 3:02:30 because that's how long it could take for any client's DNS cache to expire. One way to monitor this is through the Queries by Endpoint Returned metric described here. This should work in most cases. However, it's not 100%. Just because you disabled an endpoint in Traffic Manager won't stop a client that know's the IP address of your endpoint from calling it. You can decide whether or not this scenario is likely for your application and clients. So, to be absolutely certain there are no active clients using the endpoint, you will need some monitoring in place at the endpoint.
Finally, if you gracefully stop your web app, virtual machine, or other service hosting the endpoint you want disabled, then any active requests to your application will complete before the service shuts down, assuming your application completes requests in a reasonable time (a few seconds).
Documentation on how to test and verify your Traffic Manager settings is available here.
TCP connections are getting exhausted. Unable to figure out what is the root cause of it. How to figure out common spike issues.
This is observed after migrating project from .net framework to core 2.0.
Application downloading blobs using WebClient which is same both in framework and core project.
The outbound TCP connections on the VM instance can be exhausted.
In App Service, limits are enforced for the maximum number of outbound connections that can be made for each VM instance. For more information, reference: Cross-VM numerical limits (https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#cross-vm-numerical-limits). You may scale up the App Service Plan as per your requirement.
These limits apply only for customers of Basic or higher plans; in other words, customers running on their own dedicated VMs. These limits are there to protect the entire VM even though one particular site may be with its limits described above.
The limits are different depending on the size of VM configured.
Limit name Description Small (A1) Medium (A2) Large (A3)
Connections Number of connections across entire VM 1920, 3968, 8064 respectively.
Ensure that your application is not trying to access a local address- Connection attempts to local addresses (e.g. localhost, 127.0.0.1) and the machine's own IP will fail, except if another process in the same sandbox has created a listening socket on the destination port.
Reference: http://www.freekpaans.nl/2015/08/starving-outgoing-connections-on-windows-azure-web-sites/ - it’s a 3rd party blog, be cautious with the steps.
I have an Azure website (website.mycompany.com) that uses a WCF service for some data. The WCF Service sits behind an Azure Traffic Manager (service.mycompany.com) running in "priority mode", with 2 instances of the service for failover handling. With priority mode, the primary always serves up the data first, unless it's unavailable. If unavailable, the 2nd instance will reply.. and so on down the line.
We've had a few instances recently where the primary endpoint for service.mycompany.com was offline. For "partnerships" who point to service.mycompany.com, they detected the switch and all was fine. Lately however, our own site (website.mycompany.com) does NOT detect the traffic manager switch, and the website has errors since the service fails to reply.
Our failover endpoint in these instances is up, and in the past the Azure website detected the switch, it's only recently we've encountered this issue. Has anyone experienced similar issues? Are there perhaps any DNS changes that we need to tweak in our Azure Website to help it detect TTL's?
Has anyone experienced similar issues?
Do you mean the traffic manager can't switch to another endpoint immediately?
Traffic manager works at the DNS level, here are the reasons why traffic manager can't switch immediately:
The duration of the cache is determined by the 'time-to-live' (TTL) property of each DNS record. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint.
The traffic manager endpoint monitor effects the response time. More information about how azure traffic manager works, please refer to the link.
The following timeline is a detailed description of the monitoring process.
Also we can check traffic manager profile using nslookup and ipconfig in windows. About how to vertify traffic Manager settings, please refer to the link.
By the way, because traffic manager works at the DNS level, it cannot influence existing connections to any endpoint. When it directs traffic between endpoints (either by changed profile settings, or during failover or failback), Traffic Manager directs new connections to available endpoints. However, other endpoints might continue to receive traffic via existing connections until those sessions are terminated. To enable traffic to drain from existing connections, applications should limit the session duration used with each endpoint.
I'm going to refer you to my answer here because while the situation isn't exactly the same, it seems like it could have the same solution. To summarize, I find it likely that you have a connection left open to the down service that isn't being properly closed. This connection is independent of TTL, which only deals with DNS caching, and as such bypasses Traffic Manager completely.
I wrote a small utility that utilizes Azure blob storage to push some files across for a secondary backup (~100GB). Thus far it works really well, however since it is sitting in a colocation area, my bandwidth usage can hit 190mb/s+ which is a bill I'd rather not pay. Given this, I have two questions:
Outbound traffic on a server with multiple IPs utilizes the first IP configured as the "main" one. I know in C# I can get a list of network adapters and change properties, but is it possible to tell an app that it's traffic needs to utilize a specific IP (instead of the default) for outgoing connections? We could use this to filter anything coming out of that IP, regardless of destination and only this app would use that address.
If not, is it possible to configure an app to send all traffic on a separate adapter that would have a single IP, so we could filter outbound at our router level to throttle that traffic?
Alternatively (if we're attacking this from the wrong angle), is it possible to limit Azure transfers to a maximum bandwidth allotment in some capacity? That's all I'm really after, as any other traffic should be able to use the maximum it can (meaning QoS doesn't apply - there isn't contention here, just too much outgoing in general).
For your backup needs, did you already evaluate RA-GRS, it provides built-in data replication to secondary location with read-only access on the data.
https://azure.microsoft.com/en-us/documentation/articles/storage-redundancy/
As far as I can tell there is no API allows you setup a limits for the bandwidth consumed, however you can enable storage monitoring so that you have a better idea on how many transactions triggered.
https://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/
Btw, there is one thing which might be able to address your cost concern is to setup your spending limit for your Azure subscription, but this depends the type of your subscription.
https://azure.microsoft.com/en-us/pricing/spending-limits/