Why do my Azure sites keep going down? - azure

I am testing Microsoft's Azure cloud platform and am hosting two very low-requirement websites. One is a simple single form MVC website that simply accepts some input from the user, performs some calculations, and spits out an output. The second is similar, but it performs a simple query against an Azure SQL Server instance.
Both websites go down constantly. What appears to be happening is that if I don't hit the website for awhile (maybe an hour or so) the GET request simply fails. Nothing is ever returned from the server. If I wait a minute or two and try again, the website works perfectly.
Anyone know what's happening or how I can fix it? I obviously cannot host websites on this platform if the reliability is this low...

Windows Azure Websites used a concept of cold (inactive) and warm (active) sites that means as long as a website has active connection it will be warm or active and after some default time, when the website does not have any active connection it will go in cold or inactive site mode. Once a new connection is established to the same site the site will wake up from inactive (cold) to active (warm) mode and depend on what kind of content needs to be rendered on the website, it may take a few seconds to complete the startup process. The concept of warm and cold sites is described here.
Technically the first GET request will fail if it returns immediately for a cold website however this call does make the site active and next requests will result success.
There was a SO discussion in which the index page had to connect to DB to get the data and because the connection time to DB was longer which resulted the overall start time longer then expected. So there could be several reason by transition from cold to warm may be longer and you can contact Windows Azure Websites team to check why there is the case.

Related

How to fine-tune NodeJS server deployed to Azure WebApp for massive load

I deployed node js server to Azure WebApp, and it worked fine. But, I see that sometime the response time is very slow. Also, I see that somewhere above 500 request/second the server start to fail handling request, and I see it use only 15% CPU. Now, I checked and the server return 500 error because the pipe is busy (by the win32 error code). That's why I was wondering if there is something I can change in the IISNode config to improve the server request capacity.
I already enabled the AlwaysOn feature, and also I add a check in Pingdom to keep the site alive. Also, I already changed nodeProcessCountPerApplication to 0 so it use all the available process.
Thank you,
Omer
One thing you can do is enable Always On. Without it, when your site hasn't been visited for 20 minutes the site gets taken down. Then the next time someone makes a request to your site Azure Web Apps warms up (re-sets up) your site but this process takes a few seconds.
Note that Always On is only available for sites in Basic, Standard, or Premium SKUs
Also, check out this page for tips on debugging Node.js apps in Azure Web Apps: https://azure.microsoft.com/en-us/documentation/articles/web-sites-nodejs-debug/

Azure WebSite Always On

I have a WebAPI application running on Azure WebSites. It is running in Basic mode and I have the option to make it "Always On". There seems to be conflicting information online about what this means exactly. I know the effect, but the "how" matters a lot here. In particular, does something automatically hit an endpoint in my application periodically? If so, can I control the endpoint it hits?
As I mentioned, it is a Web API application and the default route does non-trivial work and results in a notable amount of outbound traffic and it will also result in items being placed onto a work queue that will eventually be processed. I want the application always on (no cold start times) but I don't want some service making requests of application.
As soon as your Azure Website is marked as AlwaysOn, your site root will be hit within a few seconds. We also make sure your site is up and running on all the workers (if you have configured auto scale option or such). After that, if the worker process crashes, alwaysOn makes sure that it comes back up.
You cannot control the endpoint that it hits.

Intermittent Microsoft Azure Web Site access failure

I have a number of small MVC apps deployed as Microsoft Windows Azure websites. This has been working for several months.
Yesterday I rolled out a new one, and the deployment was unremarkable, everything worked fine. But a couple of hours later, access to the site was unavailable. The symptoms were that when the browser tried to navigate to the URL for that site, it would try to load for several minutes and then just give up with a completely blank page.
I attempted to stop and restart the site, and it worked once, but the symptoms came back several minutes later. Then I tried to stop and restart, and it didn't work.
I deployed the identical app to three additional URLs. Again, immediately on deployment, they all work fine, however, they fail at some interval in the future. They seem to not all fail at once. Sometimes restarting the site will fix the problem, and sometimes not.
IMPORTANT: If I wait for some period of time, the site may start to work again on its own.
However, deploying four versions of the app so that our users can go to a backup one if the primary one is not working is not optimal.
Any words of wisdom as to how I might go about debugging this?
ADDITIONAL INFO NOV 25, 2013:
When sites are failing, the IIS logs show either 500 or 502 Internal Service Errors. Our own MVC code is never hit, not even app_start.
You can start by checking the logs and remote debugging
http://www.drdobbs.com/windows/azure-sdk-22-supports-visual-studio-2013/240163499
Are the apps working locally?
Might not be the same problem, but from time to time our Azure instances will get the blue question mark of death as a status.
The reason we found out was that Microsoft will do upgrades on instances from time to time. If you have just one instance in a cloud service/role, then from time to time they will do maintenance and during that time it will be dead.
I have confirmed this with their support.
The only way to get around this that I know of is to create two instances. Then Microsoft guarantees ~99% availability.
Of course I also confirmed with them that this means twice the cost. =/
If that's not the issue I would enable RDP and get onto the machine to see what the problem is. Microsoft has these tools to help debug problems: http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx
First, you should always run multiple instances of your web role with more than 1 upgrade domain. This is configurable in the service definition (CSDEF). Without this, you don't get an SLA from Microsoft, so you can't really complain that the VMs go down.
Second, to figure out what might be going on with these boxes, you should have both logs (my preference is to roll my own with page blobs or table storage), AND you should always have RDP access to a pre-production environment (production as well if you're not too fussed about security). Once on the box, look through the event viewer for errors.
Third, when an outage occurs check out the azure service dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) for outages.
Lastly, contact Microsoft support. It may take a few hours, but they are pretty good.
That it is happening repeatedly and for extended periods of time (more than 5 minutes), I would be there's something wrong with your hosted service. Again, RDP in and poke around. Good luck.
To debug your sites try to enable diagnostic logs:
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics-logging-and-instrumentation/
Another nice way to look around your site is using the debug console:
https://github.com/projectkudu/kudu/wiki/Kudu-console

Web Site Availability in Windows Azure [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
App pool timeout for azure web sites
I am working on an asp.net mvc 4 app that is hosted in Windows Azure. This app will not have a lot of traffic as people will intermittently (once an hour) use it. I wanted to try using Windows Azure.
My app is currently set to use the FREE web site mode. I noticed that after 30 minutes, the site takes a long-time (> 5 seconds) to load. After that initial load, its fast. Then, if someone doesn't use it for another 30 minutes, it takes >5 seconds to load again.
I then tried upping the web site mode to a SHARED instance. I experienced the same problem there.
I then tried upping the web site mode to a RESERVED instance. The problem then goes away.
While I'd like to use Windows Azure, paying $50+ a month for a RESERVED instance is pretty expensive for a site that few have used up to this point. However, I can't have the initial lag. That will just defer the few users I have. You could say you get what you pay for. At the same time, I have a hard time believing others are experiencing this problem and not complaining. There has to be something I'm missing.
I figure the problem has to deal with the application pool resetting. However, I can't seem to figure a way around this. Is anyone familiar with this issue? Is there a way to fix it on a FREE or SHARED instance?
Thank you!
This is expected behavior based on how Windows Azure Web Sites work. The app pool they live in is spun up "on demand" and then hangs around for a time period.
For a detailed (and shameless plug) you can check out my article on this: http://www.simple-talk.com/dotnet/.net-framework/windows-azure-websites-%e2%80%93-a-new-hosting-model-for-windows-azure/
In summary:
Web Sites are hosted in a process on a farm of machines running IIS. If a site is idle for some time then the process is torn down automatically. Also, if the box is seeing a lot of pressure due to the other sites on the box the idle timeout may come down quite a bit (even as low as five minutes). When the next call comes in you'll see the process spun up again (likely on a completely different server). This is because you are in a shared environment (and is similar to how Heroku works). Once you move to reserved then you are the ONLY person on that virtual machine and if you suffer from noisy neighbor issues in processing its' because of your own stuff.
There are ways to keep your site "up", such as having a job that pings the url frequently; however, given that the idle timeout is somewhat fluid it may not solve every case. You can check out a recent post by Sandrino on how to use Azure Mobile Services as a job scheduler: http://fabriccontroller.net/blog/posts/job-scheduling-in-windows-azure/ . There are also 3rd party services available that can do the ping for you automatically.
To be honest, the web sites are a great feature for quick development and test, or even relatively low traffic sites as you are talking about. If you need a high level of uptime and better performance then you'll want to look at Reserved, or another option if the cost isn't in line with expectations.
This isn't an Azure problem. It is a "feature" of any web site hosted in IIS. The default time-out for app pools is 20 minutes. Read about App Pool timeouts here - http://technet.microsoft.com/en-us/library/cc771956(v=ws.10).aspx - one method is to create a keep alive page and ping the page every 10 minutes or so.

DotNetNuke on Windows Azure Websites performance

I am evaluating the Windows Azure WebSites Preview (WAWS I think, not sure with all these changing names and acronyms that Microsoft loves to mutate on) with DotNetNuke (DNN) which I am also using for years on a "non cloud" V-Server. Installation was a breeze. I only tried the free shared instance and I have tested with 1 and with 3 active instances with similar results.
First hit performance always was a problem with my previous DNN installations, when a website was idle for a while (15 minutes or so) the process would stop and then the next unlucky visitor will wait at least around 20 seconds. With some IIS tweaking it was possible to minimize this problem but I had the best results with a monitoring service that will request a page from DNN every five minutes and keep the process up.
While surfing the DNN page usually performs well on WAWS, I immediately noticed that the "first hit" problem is an issue with DNN on WAWS so I configured a monitoring service for the page. That did not help and the monitoring service will always report that the site is down. Almost as if WAWS was trying to avoid keeping the site up since it detected that only a monitoring service was requesting the page.
Also, when navigating on the DNN pages and then pausing for just a minute or two, I will often get an "Internet Explorer could not load this page" error with no specific error code.
Do others have experience with the DNN performance on WAWS or maybe know why the "first hit" is such a problem?
I suspect that Microsoft is actively trying to avoid the keep-alive tricks that many ASP.Net devs use. WAWS, like many shared hosting platforms, relies on only having a certain number of active websites on the server at any one time in order to achieve higher server densities and keep the cost of hosting under control. This is one of the reasons that they can offer this service for free.
I think what you want to look into is "keep alive."
What you are experiencing is the ASP .NET process getting killed for your application due to inactivity. When the process isn't in memory and the site is accessed IIS has to spin it back up which is the 10 - 20 second lag you get upon accessing your site as the process gets up again and/or just in time compiles.
You can schedule some 3rd party monitoring services to check your site every 10 minutes via an HTTP request that will keep your site up. Just pinging it will not keep it up.

Resources