Random 503 errors from Azure - azure

Not sure if i should post here or on Serverfault but this morning we have been getting random 503 errors from Azure (web apps).
They occur from random places across the world and i do get them myself from time to time.
In our "Support Observe" view i do see a lot of errors:
I do not see that amount of erros in our event logs thou. I do however see some errors that could be something like.
6136
w3wp
Role environment . FAILED TO INITIALIZE. hr: -2147024891
and from W3SVC-WP that are really cryptic like.
*1
5
50000780*
I've found some other posts about these kind of errors here and they seems to point towards issue with Azure sometimes and sometimes not.
I'm on the East US datacenter. Anyone else having issues or can help me figuring out what this is. The fact that is occuring randomly across the world really do point towards an Azure issue?
I could also add that i do not do any load balancing so it could not be that one of the instances is down and or something like that. I have restarted and redeployed the code and so on as well.

Related

How to debug Azure swapping process (sometimes bringing site down)

We have a pretty large project that is running on Azure. For some reason swap times became really slow recently, like at least 10 minutes.
Somtimes during the swap the site becomes superslow, like that it doesn't respond for minutes.
Other times the swap just doesn't work for one reason or another.
We are using initializationPage to warmup the most specific pages, but it doesn't seem to help.
Question
Is it possible to see what's going on during the swap? I'm trying to debug why it's so slow. Is there any log that I can see why it's stuck on what?
We can't deploy emergency fixes without bringing the whole site down. and sometimes the whole site goes down.
Any help to debug swapping problems would greatly appreciated.
Update
I found the following in 'Activity log' on the Azure Portal, but I still can't find any details or any hint what is going on exactly.
So: The resource operation completed with terminal provisioning state 'Failed'.
Where can I find details? It really annoys me that I have to buy Azure Developer support while I'm spending hundreds euros per month already on something that seems broken or at least very uninformative about what is going wrong.
So: The resource operation completed with terminal provisioning state 'Failed'.
Where can I find details?
Microsoft has a few things that may help you.
You can view the operations for a deployment through the Azure portal.
You may be most interested in viewing the operations when you have
received an error during deployment so this article focuses on viewing
operations that have failed. The portal provides an interface that
enables you to easily find the errors and determine potential fixes.
The "View deployment operations with Azure Resource Manager" is directly from Microsoft it has several steps to follow. Follow the URL: Microsoft
I hope this helps.

503 error on azure cloud service

We use Azure and have problems with our Cloud Service last two days.
We get 503 error on site. It looks like one of web-roles reboots sometimes. But in dashboard all of them works fine.
Application Insight and Logs doesn't show any problems. CPU, Memory, Exception rate - all OK.
But I found one interesting moment. SQL queries average time grew to 5 seconds. But I checked it on database, it worked normal. This means that 5 seconds is not execution time but connection.
It looks too much for trace inside data center.
Does anyone have any ideas how I can find a solution of this problem?
When your app generates a lot of exceptions in short time IIS stops application pool and you get 503 error.
For more details google for "IIS Rapid Protection".

Random 503 errors in Azure Mobile Services

At certain times during the week while I'm testing my Mobile Services app I get a 503 error (Service Unavailable). It happens whether I try to call the app from localhost or live on my Azure Website. It hangs around for 10-15 minutes and then goes away on its own. It doesn't seem to be caused by anything in particular that I am doing (i.e. I have not updated any code). The 503 error occurs when I'm trying to call one of my custom APIs in my Mobile Services account. A few of the requests make it through (strangely enough) but the majority return a 503 error.
I've seen that someone had a very similar problem here (Why does Azure give me an intermittent Error 503. The service is unavailable?) without an acceptable resolution.
I am using the free version of Mobile Services but I should be no where near pushing the limits of what the free version can handle; I am the sole user of the app right now.
It will soon be time to make the service live and I'm shuddering at the thought of support calls that will come in during one of these funky states the service gets into. Any help in debugging the problem would be greatly appreciated.
EDIT:
I've narrowed this down to a database problem. I have one main query (sproc) that I use to feed data to the UI. I noticed that when I get the 503 errors the query takes about 13 seconds (when run in SSMS). When things are running "normally", the query takes less than a second.
This doesn't solve my problem though, in fact it makes it more perplexing because I am using the Business Edition of Windows Azure SQL Database and there shouldn't be a 13 second fluctuation in execution time!
This problem seems to happen randomly. Is there some kind of caching in SQL Server that could explain this? Maybe my query really does take 13 seconds to execute and the caching superficially speeds it up.
Could you try transitioning your database/server to one of the "editions"? They have resource governance to promote predictable performance. Web/Business suffer from a noisy neighbor problem. It sounds like that may be your issue, considering it is intermittent.
Here's a link to a page describing the editions. https://msdn.microsoft.com/en-us/library/azure/dn741340.aspx

Getting ocassional 503 errors on azure website

I'm getting occasional 503 errors on our site. It usually happens after not visiting the site for a while. The whole page might return 503 or just some resources like css or js files.
It seems to go away after you've surfed the site for a bit and hit all of our servers.
Elmah doesn't show any errors.
I've gone into the logs on each of our servers (three medium web roles on azure) and I can't find any problems.
Our deployment has been up since December without a code change, we've been having this problem for about a week.
One thing to note is that when this happens the site doesn't shut down. I would think that would happen if IIS was crashing and restarting (even with three servers).
Does anyone know how to diagnose or fix this problem?
While this could be code related, I'll assume you've already explored this route as much as possible via logs (and since you haven't deployed new code). Having said that:
Do your issues align with the Compute service degradation events shown in the Azure Dashboard over the past several days? Look at Historical View and you'll see a few issues around Compute. Depending on your data center, maybe this is related?

Deploying Windows Azure, DNS name not working

I have two free subscriptions for windows azure and because I exceeded the limit on the first one, Microsoft closed it down. So I tried to deploy my application from the other subscription, and changed a few settings, and it seems to take a lot longer and the dns name of the depolyed application (in production area) does not seem to work. (I've been waiting for about 15 minutes.. in the other subscription it was almost immediate that the link started to work..). Also my webrole seems to be in a state of busy for a very long time..
The application always worked fine and now I'm getting all this trouble just by switching subscription?? I'm getting really frustrated with this especially because I all worked perfectly before. Now I have to 'waste' my time getting all the things to work again and I can't start with anything new. I don't think this is normal but I can't seem to find the solution to this either.
edit:
Over half an hour the dns finally started working but this still does not fix the problem with the extreme slow deploying and the busy state of the webrole..
Please study the discussion below to understand why the time to deploy an application could vary between 10-30 minutes:
Is there a way to reduce time between Azure deployment start and role OnStart() code being invoked?
Above details will helped you to get the answer about your statement ".. this still does not fix the problem with the extreme slow deploying and the busy state of the webrole.."..
To add more about that, when your application is deployment phase it goes through several state and in some cases the time taken in one state could be longer then expected and during this time you will see status as "Busy", "Initializing", "Starting.." etc and these state actually explain which level you are during your deployment. I hope this helps you to understand the time taken during deployment.

Resources