Deploying Umbraco to SQL Azure - azure

I've successfully implemented Umbraco 4.7 to a Windows Azure Website and SQL Azure, but sometimes I get errors similar to this one:
SQL helper exception in ExecuteScalar ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) --->
It seems that Umbraco does not manage retry logic (sql azure transient fault handling). Does anybody know of any non-traumatic way to implement this on the umbraco side?

Amhed, There is no easy way todo this, people using Umbraco in Azure have generally come up with session state problems that are causing connection issues, but as you stated you are only catching a 1% which means you are picking up transient errors. You will have probably seen the discussion here
http://our.umbraco.org/forum/core/general/27179-SQL-Azure-connectivity-issues?p=1
The net effect is that you have to take the umbraco code base and rebuild it with a retry framework in it. That comes with a serious overhead to using Umbraco and taking future releases. You would best lobbying Umbraco core team to put a full retry framework in place and supporting it. (Lets not even talk about security issues ;)
This probably not what you want to hear but effectively it is roll your own datalayer time.
Having said all that I went and had a look: ;) (Because this does interest me for something else)
As starting point though from looking at the source in Umbraco they are using
Microsoft.ApplicationBlocks.Data
As the base for making connections and executing SQL commands
from Looking at the umbraco.DataLayer.SqlHelpers.SqlServer.SqlServerHelper
using SH = Microsoft.ApplicationBlocks.Data.SqlHelper;
So my guess is that you would need to replace this block. A quick (dirty) search on the internet you get
http://www.sharpdeveloper.net/source/SqlHelper-Source-Code-cs.html
You could reflect out the class though to make sure.
This then you could rebuild out with a transient fault handling framework and then you could effectively drop in potential as a change of dll (famous last words).
But you could at least get
using (ReliableSqlConnection conn = new ReliableSqlConnection(connString, retryPolicy))
easily going in that type of class and more lovely stuff.
http://msdn.microsoft.com/en-us/library/hh680899(v=pandp.50).aspx
If I was going to do this, this is where i would start from and I would go in at that level.
I am not sure if that would cover 100% the connection set as I don't work actively in umbraco code base so this is best guess but looking at source this were I would start from and change out and looks to be your best starting point.
hths,
James

Related

Azure WebApps leaking handles "out of nothing"

I have 6 WebApps (asp.net, windows) running on azure and they have been running for years. i do tweak from time to time, but no major changes.
About a week ago, all of them seem to leak handles, as shown in the image: this is just the last 30 days, but the constant curve goes back "forever". Now, while i did some minor changes to some of the sites, there are at least 3 sites that i did not touch at all.
But still, major leakage started for all sites a week ago. Any ideas what would be causing this?
I would like to add that one of the sites does only have a sinle aspx page and another site does not have any code at all. It's just there to run a webjob containing the letsencrypt script. That hasn't changed for several months.
So basically, i'm looking for any pointers, but i doubt this can has anything to do with my code, given that 2 of the sites do not have any of my code and still show the same symptom.
Final information from the product team:
The Microsoft Azure Team has investigated the issue you experienced and which resulted in increased number of handles in your application. The excessive number of handles can potentially contribute to application slowness and crashes.
Upon investigation, engineers discovered that the recent upgrade of Azure App Service with improvements for monitoring of the platform resulted into a leak of registry key handles in application worker processes. The registry key handle in question is not properly closed by a module which is owned by platform and is injected into every Web App. This module ensures various basic functionalities and features of Azure App Service like correct processing HTTP headers, remote debugging (if enabled and applicable), correct response returning through load-balancers to clients and others. This module has been recently improved to include additional information passed around within the infrastructure (not leaving the boundary of Azure App Service, so this mentioned information is not visible to customers). This information includes versions of modules which processed every request so internal detection of issues can be easier and faster when caused by component version changes. The issue is caused by not closing a specific registry key handle while reading the version information from the machine’s registry.
As a workaround/mitigation in case customers see any issues (like an application increased latency), it is advised to restart a web app which resets all handles and instantly cleans up all leaks in memory.
Engineers prepared a fix which will be rolled out in the next regularly scheduled upgrade of the platform. There is also a parallel rollout of a temporary fix which should finish by 12/23. Any apps restarted after this temporary fix is rolled out shouldn’t observe the issue anymore as the restarted processes will automatically pick up a new version of the module in question.
We are continuously taking steps to improve the Azure Web App service and our processes to ensure such incidents do not occur in the future, and in this case it includes (but is not limited to):
• Fixing the registry key handle leak in the platform module
• Fix the gap in test coverage and monitoring to ensure that such regression will not happen again in the future and will be automatically detected before they are rolled out to customers
So it appears this is a problem with azure. Here is the relevant part of the current response from azure technical support:
==>
We had discussed with PG team directly and we had observed that, few other customers are also facing this issue and hence our product team is actively working on it to resolve this issue at the earliest possible. And there is a good chance, that the fixes should be available within few days unless something unexpected comes in and prevent us from completing the patch.
<==
Will add more info as it comes available.

Diagnosing ASP.NET Azure WebApp issue

since a month one of our web application hosted as WebApp on Azure is having some kind of problem and I cannot find the root cause of that.
This WebApp is hosted on Azure on a 2 x B2 App Service Plan. On the same App Service Plan there is another WebApp that is currently working without any issue.
This WebApp is an ASP.NET WebApi application and exposes a REST set of API.
Effect: without any apparent sense (at least for what I know by now), the ThreadCount metric starts to spin up, sometimes very slowly, sometimes in few minutes. What happens is that no requests seems to be served and the service is dead.
Solution: a simple restart of the application (an this means a restart of the AppPool) causes an immediate obvious drop of the ThreadCount and everything starts as usual.
Other observations: there is no "periodicity" in this event. It happened in the evening, in the morning and in the afternoon. It seems that evening is a preferred timeframe, but I won't say there is any correlation.
What I measured through Azure Monitoring Metric:
- Request Count seems to oscillate normally. There is no peak that causes that increase in ThreadCount
- CPU and Memory seems to be normal, nothing strange.
- Response time, like the others metrics
- Connections (that should be related to sockets) oscillates normally. So I'd exclude something related to DB connections.
What may I do in order to understand what's going on?
After a lot of research, this happened to be related to a wrong usage of Dependency Injection (using Ninject) and an application that wasn't designed to use it.
In order to diagnose, I discovered a very helpful feature in Azure. You can reach it by entering into the app that is having the problem, click on "Diagnose and solve problems" then click on "Diagnostic tools" and then select "Collect .NET profiler report". In that panel, after configuring the storage for the diagnostic files, you can select "Add thread report".
In those report you can easily understand what's going wrong.
Hope this helps.

Random 503 errors in Azure Mobile Services

At certain times during the week while I'm testing my Mobile Services app I get a 503 error (Service Unavailable). It happens whether I try to call the app from localhost or live on my Azure Website. It hangs around for 10-15 minutes and then goes away on its own. It doesn't seem to be caused by anything in particular that I am doing (i.e. I have not updated any code). The 503 error occurs when I'm trying to call one of my custom APIs in my Mobile Services account. A few of the requests make it through (strangely enough) but the majority return a 503 error.
I've seen that someone had a very similar problem here (Why does Azure give me an intermittent Error 503. The service is unavailable?) without an acceptable resolution.
I am using the free version of Mobile Services but I should be no where near pushing the limits of what the free version can handle; I am the sole user of the app right now.
It will soon be time to make the service live and I'm shuddering at the thought of support calls that will come in during one of these funky states the service gets into. Any help in debugging the problem would be greatly appreciated.
EDIT:
I've narrowed this down to a database problem. I have one main query (sproc) that I use to feed data to the UI. I noticed that when I get the 503 errors the query takes about 13 seconds (when run in SSMS). When things are running "normally", the query takes less than a second.
This doesn't solve my problem though, in fact it makes it more perplexing because I am using the Business Edition of Windows Azure SQL Database and there shouldn't be a 13 second fluctuation in execution time!
This problem seems to happen randomly. Is there some kind of caching in SQL Server that could explain this? Maybe my query really does take 13 seconds to execute and the caching superficially speeds it up.
Could you try transitioning your database/server to one of the "editions"? They have resource governance to promote predictable performance. Web/Business suffer from a noisy neighbor problem. It sounds like that may be your issue, considering it is intermittent.
Here's a link to a page describing the editions. https://msdn.microsoft.com/en-us/library/azure/dn741340.aspx

Azure ServiceBus Queue Timing Out

Encountering a strange issue with one of our queues (for production, no less). When I try to put a message onto the queue, it's throwing an exception that simply states:
A timeout has occurred during the operation
The messages do seem to be making it onto the queue, as evidenced by the fact that I can see the queue length increasing in the management portal. However, the client application is not receiving any messages.
The management portal shows that there have been several failed requests, and also several internal server exceptions; though unfortunately I don't see any way to get more details about those failed requests and errors.
I'm somewhat at a loss as to what may have caused this, how to get more information about what's wrong, and how to move ahead in troubleshooting this. Any help would be greatly appreciated.
edit: I should mention just for completeness sake, that I did not make any changes to the clients that I'm aware of; This issue just sort of started happening all of a sudden
edit #2, woke up this morning, and things have magically returned to normal. Still not sure what happened, so I'd like to change the tone of the question to solicit suggestions as to how this kind of thing may be mitigated and/or troubleshooted (troubleshot? troubleshat? :) ) better
I have experienced this scenario too. When I tried too create a new service bus namespace, and pointed my app to this new namespace, it worked for me. This suggests that it might be some hardware failure going on (on the node where your sb-namespace resides).
Be sure to use transient failure handling, for example http://www.nuget.org/packages/EnterpriseLibrary.WindowsAzure.TransientFaultHandling/
But there might as well be required too use a "second level retry" for errors that are not transient. This you have to code yourself.
Too be more fault tolerant you can also use the new feature of paired namespaces. Here is a good resource: http://msdn.microsoft.com/en-us/library/dn292562.aspx
Hth
//Peter

How does azure websites with EF migrations ensure integrity when updating

The scenario is simple: using EF code first migrations, with multiple azure website instances, decent size DB like 100GB (assuming azure SQL), lots of active concurrent users..say 20k for the heck of it.
Goal: push out update, with active users, keep integrity while upgrading.
I've sifted through all the docs I can find. However the core details seem to be missing or I'm blatantly overlooking them. When Azure receives an update request via FTP/git/tfs, how does it process the update? What does it do with active users? For example, does it freeze incoming requests to all instances, let items already processing finish, upgrade/replace each instance, let EF migrations process, then let traffics start again? If it upgrades/refreshes all instances simultaneously, how does it ensure EF migrations run only once? If it refreshes instances live in a rolling upgrade process (upgrade 1 at a time with no inbound traffic freeze), how could it ensure integrity since instances in the older state would/could potentially break?
The main question, what is the real process after it receives the request to update? What are the recommendations for updating a live website?
To put it simply, it doesn't.
EF Migrations and Azure deployment are two very different beasts. Azure deployment gives you a number of options including update and staging slots, you've probably seen
Deploy a web app in Azure App Service, for other readers this is a good start point.
In General the Azure deployment model is concerned about the active connections to the IIS/Web Site stack, in general update ensures uninterrupted user access by taking the instance being deployed out of the load balancer pool and redirecting traffic to the other instances. It then cycles through the instances updating one by one.
This means that at any point in time, during an update deployment there will be multiple versions of your code running at the same time.
If your EF Model has not changed between code versions, then Azure deployment works like a charm, users won't even know that it is happening. But if you need to apply a migration as part of the migration BEWARE
In General, EF will only load the model if the code and DB versions match. It is very hard to use EF Migrations and support multiple code versions of the model at the same time
EF Migrations are largely controlled by the Database Initializer.
See Upgrade the database using migrations for details.
As a developer you get to choose how and when the database will be upgraded, but know that if you are using Mirgrations and deployment updates:
New code code will not easily run against the old data schema.
If the old code/app restarts many default initialization strategies will attempt roll the schema back, if this happens refer to point 1. ;)
If you get around the EF model loading up against the wrong version of the schema, you will experience exceptions and general failures when the code tries to use schema elements that are not there
The simplest way to manage a EF migration on a live site is to take all instances of the site down for deployments that include an EF Migration
- You can use a maintenance page or a redirect, that's up to you.
If you are going to this trouble, it is probably best to manually apply the DB update, then if it fails you can easily abort the deployment, because it hasn't started yet!
Otherwise, deploy the update and the first instance to spin up will run the migration, if the initializer has been configured to do so...
If you absolutely must have continuous deployment of both site code/content and model updates then EF migrations might not be the best tool to get started with as you will find it very restrictive OOTB for this scenario.
I was watching a "Fundamentals" course on Pluralsight and this was touched upon.
If you have 3 sites, Azure will take one offline and upgrade that, and then when ready restart it. At that point, the other 2 instances get taken off-line and your upgraded insance will start, thus running your schema changes.
When those 2 come back the EF migrations would already have been run, thus your sites are back.
In theory then it all sounds like it should work, although depending upon how much EF migrations need running, requests may be delayed.
However, the comment from the author was that in this scenario (i.e. making schema changes) you should consider if your website can run in this situation. The suggestion being that you either need to make your code work with both old and new schemas, or show a "maintenance system down page".
The summary seems to be that depending on what you are actually upgrading, this will impact and affect your choices and method of deployment.
Generally speaking if you want to support active upgrades you need to support multiple version of you application simultaneously. This is really the only way to reliably stay active while you migrate/upgrade. Also consider feature switches to scale up your conversion in a controlled manner.

Resources