Azure App-Service Swap "Bounces" Between Source and Destination - azure

I'm seeing some interesting behavior on Azure App Service that I'm hoping somebody will be kind enough to comment on.
Reproduction steps (all Azure steps can be done in the portal):
Create a new Web App in App Service (Standard pricing level, single instance is fine), e.g. mysite
Create a new staging slot for that App, e.g. mysite-staging
Deploy a bare-bones ASP.NET app to mysite with a file /scripts/test.js that has the content //ONE
Deploy a bare-bones ASP.NET app to mysite-staging with a file /scripts/test.js that has the content //TWO
Swap the deployment slots
Immediately after the swap starts, navigate to mysite.azurewebsites.net/scripts/test.js and monitor the returned content during the swap operation (by continually doing a force-refresh in the browser)
What I would expect to see:
At some point during the swap, the content changes seamlessly/consistently/irreversibly from //ONE to //TWO
What I actually see:
During the swap operation, the content "flickers"/"bounces" between //ONE and //TWO. After the swap operation is complete, the behavior is stable and //TWO is consistently returned
The observed behavior suggests that there is no single point in time at which all traffic can be said to be going to the new version.
The reason this concerns me is the following scenario:
A user requests a page mysite.azurewebsites.net which, during this "bouncing" stage, responds with the "v2" version of the page with a link to a CDN-hosted script mycdn.com/scripts/test.js?v2 (the ?v2 is a new query string)
The browser requests the script from the CDN, which in turn requests the script from mysite.azurewebsites.net. This time, the "bouncing" causes the response to be the v1 version of the script.
Now we have a v1 version of the script cached in the CDN, which all users in that region will load with the v2 version of the page
My question: Is this "bouncing" behavior during a swap operation "by design"? If so, what is the recommended approach for solving the pathological case above?

The behavior you've described is currently by design. When we perform the swap we update the mappings between hostnames and the sites in our database but our frontend instances cache those mappings and refresh them every 30 seconds. So the "bouncing" period may last up to 30 seconds.
I do not have at the moment a good recommendation on how to solve the case, but will look into possible ways to address this.

Related

How can I get the "real" Azure App Service slot instance name?

I'd like to know how I can get a unique identifier for the "slot instance" that an Azure App Service slot has loaded into it.
Note that I am not referring to the name of the slot.
For example:
My Azure App Service has two slots named "App" (Production) and "App-Staging" (Staging).
I then deploy version 1 of my project to the "App" slot.
I then deploy version 2 of my project to the "App-Staging" slot.
I then perform a Slot Swap operation from the Azure Portal:
Both instances of my app are running simultaneously (imagine they run in a container of some kind).
Both "containers" are immediately "detached" from their associated slot while still running.
Both "containers" are then immediately re-attached to the opposing slot.
i.e. version 1 stops receiving HTTP requests for app.azurewebsites.net and suddenly starts receiving requests for app-staging.azurewebsites.net.
and version 2 stops receiving HTTP requests for app-staging.azurewebsites.net and suddenly starts receiving requests for app.azurewebsites.net.
In order to investigate some issues I was having, I created a text file at D:\home\SlotName.txt. In the "App" slot I entered "SlotA" and in the "App-Staging" slot I entered "SlotB".
This SlotName.txt moves with the application instance, and allows my application to detect which filesystem or "container" instance it's living in - and this doesn't change when a slot-swap is performed.
I find this information essential when trying to uniquely identify deployments or when investigating logging continuity issues (as the staging slot won't be logging production data, for example).
However, my SlotName.txt file seems like a hack - but I can't see any information in the Environment Variables for my app's instance that reveals the same information.
Environment variables do reveal the slot name, e.g. "App" and "App-Staging" which is mutable - of course, but it doesn't uniquely identify the "container" or filesystem instance that the app is deployed into.
Here's the two Kudu Environment pages from the Production and Staging slots - notice that the values are either identical (like Machine name), slot-specific, or refer to the deployed application code and none of them refer to the filesystem / container instance they live in:
Is there any way to get this information without using my SlotName.txt trick?
The answer hiding right under my nose - and different terminology.
What I was calling a "slot instance name" is actually referred to as a "Deployment Id" (I know this is an overloaded term as it's also used in the context of Azure's (now legacy) "Cloud Services" PaaS too).
This information is visible in the Kudu environment page and is also an exposed as an environment-variable: WEBSITE_DEPLOYMENT_ID.
The WEBSITE_DEPLOYMENT_ID value is of the form {SiteName}[__{Random}], with the __{Random} prefix omitted for the first deployment space.
If you look closely at the screenshot I posted, you'll notice the left-hand screenshot has the site-slot-name Site1__e928 whereas the right-hand screenshot is of the "first" slot-space and so its name is just Site1.
It is unfortunate that this information is not documented by Microsoft publicly - at least so far as Google can see (searching for the term right now yields zero useful relevant results):
Mystery solved!
You have something called deployment slot setting values as shown in below image:
Attach a key-value pair in each slot with different value.
Here, this setting sticks to the specific app, even if you swap.
This can be trick you're looking for.
Source

deployment slot stops working after afew minutes?

I have a azure app servcice. next i created a deployment slot , shown as web app called myapp/staging.
in visual studio, i deployed to the staging location.
it works for a couple minutes , but then it looks like it was never deployed - see picture
If any error occurs during a slot swap, it's logged in D:\home\LogFiles\eventlog.xml. It's also logged in the application-specific error log.
During custom warm-up, the HTTP requests are made internally (without going through the external URL). They can fail with certain URL rewrite rules in Web.config. Just review your rewrite rules.
As you're publishing it through VS, when you right click and select Publish Web on the left-hand side, you would find the settings tab. Select that. Then, expand the option under File Publish Options and check the Box for “Remove additional files at destination” – Review this option based on your requirement.
Also, just for additional information, an HTTP request to the application root is timed. The swap operation waits for 90 seconds for each HTTP request, and retries up to 5 times. If all retries are timed out, typically the swap operation is stopped.

Could not retrieve the CDN endpoints in subscription with ID

Searched Google and so - no luck.
Just got this message in Azure for 3 CDN endpoints.
There seems no way to know what is going on without MS support. It is a test account and I do not recall setting this. I have been through similar obfuscated MS error messages only to discover that Azure had crashed.
What does it mean?
This isn't really a direct answer, but could help with the general problem of "what happens if the CDN goes down?".
There is a recent development called the "Progressive Web App".
Basically unless served by localhost, everything has to be over https, but script is cached as a local application in your browser.
When your app makes requests to the registered domain, these are intercepted by a callback you put in your serviceWorker.js, so you can cache even application data locally, and sync the local data occasionally with the server (or on receive events if you're using webSockets).
Since the Service Worker intercepts REST calls to the registered domain, this in theory makes it fairly easy to add to just about any framework.
https://developers.google.com/web/fundamentals/getting-started/codelabs/your-first-pwapp/
Sometimes there is a (global) problem with the CDN. It happend before.
You can check the azure CDN status on this page: https://azure.microsoft.com/en-us/status/
At this moment everything looks good, you still have problems?

IIS worker threads issue

I have my site hosted on IIS hosting. Site has feature that needs calling WCF service and then return result. The issue is that site is processing calling to WCF service another web site calling is freezing and not return content fast (this is just static content). I setup two chrome instances with different imacros' scripts, which one is calling page that requests wcf service and another one page is just static content. So here I can just see that when first page that requests wcf services freezes, another one page also freezes and when first is released the second is too.
Do I need reconfigure something in my Web.Config or do should I do something else to get possible to get static content immediately.
I think that there are two seperate problems here:
Why does the page that uses the WCF service freeze
Why does the static content page freeze
On the page that calls the WCF service a common problem is that the WCF client is not closed. By default there are 10 WCF connections with a timeout of 1 min. The first 10 calls go fine (say they execute i 2 secs), then the 11th call comes, there are no free wcf connections it must therefore wait 58 secs for a connection to timeout and become available.
On why your static page freezes. It could be that your client only allows one connection to the site, the request for the static page is not sent untill the request for the page with the wcf services is complete.
You should check the IIS logs to see how must time IIS is reporting that the request is taking.
I would say that this is a threading issue. This MSDN KB article has some suggestions on how to tune your ASP.NET threading behavior:
http://support.microsoft.com/kb/821268
From article - ...you can tune the following parameters in your Machine.config file to best fit your situation:
maxWorkerThreads
minWorkerThreads
maxIoThreads
minFreeThreads
minLocalRequestFreeThreads
maxconnection
executionTimeout
To successfully resolve these problems, do the following:
Limit the number of ASP.NET requests that can execute at the same time to approximately 12 per CPU.
Permit Web service callbacks to freely use threads in the ThreadPool.
Select an appropriate value for the maxconnections parameter. Base your selection on the number of IP addresses and AppDomains that are used.
etc...
Consider such scenario: when you make a request to IIS your app changes, deletes or creates some file outside of App_Data folder. This often tends to be a log file which is mistakenly was put at bin folder of the app. The file system changes lead to AppDomain reloading by IIS as it thinks that app was changed, hence the experienced delay. This may or may not apply to your issue, but it is a common mistake in ASP.NET apps.
Well, maybe there is no problem...
It may be just the browser's same domain simultaneous requests limit.
Until the browser not finished the request to the first page (the WCF page), it won't send the request to the second page (the static).
Try this:
Use different browsers for each page (for example chrome/firefox).
Or open the second page in chrome in incognito window (Ctrl + Shift + N).
Or try to access each page from different computer.
You could try to use AppFabric and see what is wrong with your WCF services http://msdn.microsoft.com/en-us/windowsserver/ee695849

How to serve a static website from S3 or Azure Blob with http status 503?

I'm looking for a way to serve a "Maintenance Mode" website from Amazon S3 or Azure Blob storage while I'm updating my website to a new version. I'd like to just flip DNS over to point to maint.mydomain.com (which would be a static site & return 503 http status). Is this possible to do with either of these, or would I need to create a traditional website to host this?
I can get S3 to serve a website, but it always shows HTTP status 200. Any ideas?
The way I ended up solving this is by creating an Azure deployment that just has app_offline.htm. When I need to have an outage, I just deploy that package to production, and have my next version in staging while I do the database migration. Then I do the vip swap to the new version.
The bad part of this is that my previous version is no longer waiting in staging once i've flipped to the new version, but then again I did just change the DB schema so maybe rollback is a little more involved in this scenario anyway.
It seems like you can also make amazon s3 to return 404s for your website during maintenance by specifying incorrect path to the index file and providing correct path for the error page which will be always returned when you hit any url at the endpoint (including root).
You're approaching this wrong.
You should run multiple instances, a staging and a production. Both staging and production are "production" code, but staging is used to actually deploy your changes. Once your staging is up and running you flip the staging and production instances (in Azure this is called a VIP swap). This allows the user to experience an "instant" upgrade (in quotes as there is still some fractional downtime and you can get errors in cases where the user comes in at the exact moment of the switch).

Resources