Outbound HTTP calls occasionally being blocked on Azure VM - azure

EDIT: 3/16/17
Today we tried creating a new VM in the same private network and resource group (using the same network protection group as well), and ran the test, it worked perfectly. It makes no sense.
using PSping to diagnose this (using port 443 and 8001 to test) we can see the dropped traffic there as well, so it's not the application. The same tests on the other test Azure VM work flawlessly. So it seems it is this particular VM, I just don't understand why, since nothing has changed. We also see dropped traffic to other random sites over 443 (ping and http are flawless).
We've tried rebooting and redeploying with no luck
Original:
Our Azure VM is experiencing random failures in creating connections to several 3rd party systems. occurring in a seemingly random fashion. (every 5-10 min or so an http call will fail)
Since Monday morning, we are seeing web service (http) calls made from the Azure VM fail in a seemingly random fashion- We are getting error messages suggesting the endpoint host is simply not responding. We have engaged both 3rd parties and it appears that these http calls are not reaching their servers at all. Everything was working fine up until Monday, and no changes have been made to the system.
We think the Azure VM (or Azure Networking Limits) are causing the problem because:
We created and deployed the same “test” program on both the Azure VM and an on-prem test VM with the same specs, and the program works fine using our on premise VM.
a. This program simply makes information request calls- a single one to each 3rd party system; the program is run every minute. Thus, both servers (Azure VM and On-prem VM) are repeating identical calls on the same schedule.
b. On the test server, the success rate is 100%- we have seen no errors, even when I bumped it up to try every 10 seconds).
c. On the production server, we see frequent errors connecting to both systems.
In looking at the IIS logs from the 3rd Parties, We see blank spots when we see failed http calls In any event, no suspicious activity seems to be showing up in their case and their logs only show the successful calls.
We show only 10-20 TCP connections on the server, so we are not close to hitting the Azure 500k TCP connection limit.
Pings and tests to any site on the internet from the Azure VM seem to work fine, so network connectivity seems fine.
Are we hitting some other kind of limit on the Azure system, that could be causing these random errors?
I noticed this person had a similar issue but no resolution was found.(Azure VM outbound HTTP is unreliable)

Related

How can I find the reason why one of the Azure Web App instances stops responding to http requests?

We have a problem with an application hosted on Azure App Service Plan version P3V2. Depending on the traffic in our system, the App Service Plan is scaled up to 5 instances. During heavy traffic, one of the application instances stops responding to requests and after a few minutes it is restarted by the Auto Heal functions. We use Application Insights to monitor the application, unfortunately no logs are saved when there is a problem with the operation of this one instance, also in Event Log we do not see any logs that can help us.
Below I am sending a screen from last week showing all the situations when one of the instances stopped responding to requests.
We did not notice any jumps in the memory used in the application as well as the increased processor time.
I would be grateful for any suggestions or tips.

Azure VPN Client "Status = VPN Platform did not trigger connection"

Several users have been randomly experiencing this issue when trying to connect using Azure VPN Client for about a week now. Some have had the issue for a week straight, others have had it for a period of time before it resolved itself.
When trying to connect, the authentication succeeds but the VPN fails to connect with the error "Status = VPN Platform did not trigger connection". The only similar issue I found when searching relates to the VPN client not having permission to run-in the background, but this seems to be a different issue.
Things we've tried so far:
Resetting and doubly reseting the gateway
Rebooting the machine
Reinstalling the VPN profile on the client
Regenerating the VPN profile
Reinstalling the VPN client on the machine
Toggling various network & app settings
The logs in the Azure portal show the user successfully authenticating, but nothing to show the VPN failing to connect. All the affected machines are running versions of Windows 10 and are up to date. The problem seems to be specific to the machine itself - using a different set of credentials or a different network makes no difference. The problem is occurring in multiple locations and other machines at the same location are unaffected. The only pattern seems to be that older machines are more likely to be affected but I'm not even totally sure that's anything other than coincidence. Five or so machines have experienced the issue and three are currently affect out of fifteen or so total.
Best guess at this point is that an update caused the issue. Any further troubleshooting suggestions are welcome at this point as I'm unsure of the cause and unable to reproduce the problem. Azure support say the gateway configuration looks fine but are unsure what's causing the problem - they've just had me run packet capture on both ends and I'm waiting to see what they say about the results but it's causing quite an issue for users without VPN access.
I had the same issue "Status = VPN Platform did not trigger connection" and tried all the same steps as above. After getting no where with the basics, I opened a ticket MS and they suggested that I restart the rasman (Remote Access Connection Manager) service. After the service restart the Azure VPN client connected without issue. I also set the startup type to Automatic (delayed). BTW an admin has to make this change.
Please check in if Background Apps in settings is turned on. It should be turned on. Let Azure VPN remain active and turn off the rest of the apps which you dont use it frequently.

Azure calls working on local but not on production server - Firewall settings?

We have an Azure set up where we use Azure as our proxy for sending data to our apps via Azure functions.
We are having issues because evidently our local development Windows environments send the calls to Azure, and we verify this by logging into our Azure portal, and watching any traffic to the calls in the Azure function console. When we run our code on our local machines, we see the traffic and the calls getting made, BUT when we try the same calls on our production server environment (hosted onsite, Windows Server 2016) to Azure, we can't see any traffic come through our Azure calls.
I am trying to chase down whether it is the Firewall on the production server machine and to see if there are any Outbound Firewall rules that need to be opened up or added to talk to Azure, but I have not seen anything by doing my Google searches that brings up local machine talking to Azure. Most of the articles that come up are about setting up a Firewall on Azure, not local firewall rules to Azure.
The application we are running is an onsite IIS hosted website with calls out to Azure.
Anyone have any pointers on where or what I should be looking at to see if there is any communication coming from our production server to Azure on the production server. Which logs, rules, anything that could point us in a direction. I felt I have looked in most places. I have looked in IIS logs, application logs (we just send a log saying that the call was sent)
But if there is a specific Firewall setting on the produciton server that I need to add, I don't know what that would be and if anyone does know, it would be very helpful.
UPDATE:
We have so far found that we can hit the functions through a browser enabling GET requests and other functions that allow GET requests. The issue seems to be either IIS or a permission with IIS or the application itself. We actually set the permissions on our application on our server to "Everyone" just to see what would happen, on the folder for the application and still have not have any luck. The calls we are calling are actually POST to the Azure function. We don't have Postman on the machine.
Assuming you're calling out to an Azure Funciton, which is not running on an App Service Environment, or behind API management or similar, then the only place you can restrict access is on the networking tab of the settings of the function. If you don't have this configured then the function is not where the issue is.
If traffic outbound from your on-prem server is being blocked, then you will need to talk to your IT team to get that opened up. You don't mention how you're calling your funciton, but if it is an HTTP trigger, then you would need port 443 outbound open.

Does an Azure Web App care if its instances are healthy/unhealthy?

If I deploy a web app (formerly known as an Azure WebSite) to an App Hosting Plan in Azure with a couple of instances (scale = 2) will the load balancer in front of the instances care if any of the instances is unhealthy?
I'm troubleshooting an issue that sometimes causes my site to return an http 503 ~50% of the time. My thinking here is that one of two of my instances has failed but the load balancer hasn't noticed.
If the load balancer does care, what does it look for? I can't find anyway to specify a ping url, for instance.
Note: this question has nothing to do with Traffic Manager.
Yes, Azure Web Apps monitors the health of the workers by making internal requests to it and verifying that they're healthy.
However, we don't check status codes that the web app returns to user requests (like 500, etc) since that could easily be an app specific issue rather than a problem with the machine.
So the answer you're looking for is: We continuously test whether or not the instances (VMs) are healthy and take them down if they're not. However, those tests do not rely on error codes the customer's site returns

HTTP Request Timeout Windows Azure Deploy

I have an MVC 4 website using a WCF service. When I deploy to Windows Azure using the VS 2012 publish wizard, I get this error:
10:13:19 AM - The HTTP request to 'https://management.core.windows.net/42d4257b-5f38-400d-aac5-2e7acee9597d/services/hostedservices/myapp?embed-detail=true' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.
After cleaning the project and publishing a few times, the error goes away. What am I doing wrong?
Whenever you start publish process from VS machine, a SSL tunnel is established first and once the tunnel is created, the package is transferred from your machine to Windows Azure Portal first. After the upload is completed, you will see the result notifications are posted back to Publish result windows and that is how it happens.
In your case, the time to build the SSL tunnel doe secure package transfer is longer then normal, this could be because of network latency between your machine and the Windows Azure Management Portal. For security reason the time to create the tunnel smaller windows and if the connection is not created, the retry cycle starts the process again and even if that fails you are greeted with the failure message. This could be caused by excessive traffic on either side or both sides. So this is mainly a networking related issue rather then specific to Windows Azure as after some time successive tries, you could upload your package.
In such failure/situation, you can run network capture utilities i.e netmon, wireshark, and see the time taken during failure and success to see the different in various transfer. This will help you to understand the underlying delaying issues.
Try to update your roles diagnostics
like below
then update your storage credentials because it may be expired.

Resources