Website on Azure Virtual Machine stops responding every day - azure

I have a website (orders.cpidealers.com) running on an Azure Virtual Machine currently configured to Basic, A2 (2 cores, 3.5 GB memory) monitoring 3 endpoints.
Every morning since Tuesday, June 24,
The website has been unavailable (the browser just spins, I don't even get a 401 or any error)
I can't RDP into the virtual machine,
The endpoint status shows a warning triangle (although when I click on the link next to it some say Not Available while others give a time, I'm not sure I know how to translate the endpoint status box).
To resolve the problem, I login to Azure and restart the Virtual Machine. So far, everything seems to work fine for the remainder of the day until I arrive to work in the morning at 7:30 (Mountain Time).
Any suggestions on how to troubleshoot this?

Well, it seems to me like your app somehow manages to hang IIS by wasting resources. Cant tell you more without any data. You should enable some performance counters monitoring and see what is going on.
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-dotnet-use-performance-counters/
http://www.codeproject.com/Articles/303686/Windows-Azure-Diagnostics-Performance-Counters-In

It looks like the system was hanging as Rouen mentioned. From that, we found this article which seems to have resolved the problem: IIS: Web Application hangs periodically needs system reboot
Here is everything my developer did:
I changed a few other things on the server. Set the sql server to never auto close, which should help the performance in the morning, set the gupdate to manual ( we did that together ) and then I found this article, which seems an exact case for our problem so I set the Credentials Manager to automatic and restarted.
IIS: Web Application hangs periodically needs system reboot

Related

Stuttering SignalR When Using IIS

I am using ASP .NET SignalR to create a realtime game in the browser. It seems to run fine, other than occasional stutters on the display output.
I first noticed this when I deployed my application to a virtual PC in the cloud. To rule out bandwidth issues, I then deployed it to another virtual PC running locally.
Even when running the application from a virtual PC locally, I still get the stutters.
This issue does not happen when I run the application on local host using IIS express.
This makes me think that the issue has something to do with IIS, since both the cloud virtual PC and local virtual have this in common and local host does not.
After Googling I found lots of people advising to check certain settings in IIS.
These websocket settings were listed as possible culprits:
I don’t see any issues there though.
Another suggestion was to check the queue length in the advanced settings of the App Pool. Currently it is 1000. I believe that this is enough, as according to Perfmon, the number of requests received peeks at about 100 per second and sent at 80.
Has anyone else got any ideas?
I am using SignalR for ASP .NET 2.4.1 and IIS 10
Thanks
Looks like I blamed IIS wrongly, after further investigation I found the issue is most likely due to high CPU usage.
I ran Perfmon, and looked at the '%Processor Time' counters, and noticed that the application is frequently at 100%.

Intermittent Microsoft Azure Web Site access failure

I have a number of small MVC apps deployed as Microsoft Windows Azure websites. This has been working for several months.
Yesterday I rolled out a new one, and the deployment was unremarkable, everything worked fine. But a couple of hours later, access to the site was unavailable. The symptoms were that when the browser tried to navigate to the URL for that site, it would try to load for several minutes and then just give up with a completely blank page.
I attempted to stop and restart the site, and it worked once, but the symptoms came back several minutes later. Then I tried to stop and restart, and it didn't work.
I deployed the identical app to three additional URLs. Again, immediately on deployment, they all work fine, however, they fail at some interval in the future. They seem to not all fail at once. Sometimes restarting the site will fix the problem, and sometimes not.
IMPORTANT: If I wait for some period of time, the site may start to work again on its own.
However, deploying four versions of the app so that our users can go to a backup one if the primary one is not working is not optimal.
Any words of wisdom as to how I might go about debugging this?
ADDITIONAL INFO NOV 25, 2013:
When sites are failing, the IIS logs show either 500 or 502 Internal Service Errors. Our own MVC code is never hit, not even app_start.
You can start by checking the logs and remote debugging
http://www.drdobbs.com/windows/azure-sdk-22-supports-visual-studio-2013/240163499
Are the apps working locally?
Might not be the same problem, but from time to time our Azure instances will get the blue question mark of death as a status.
The reason we found out was that Microsoft will do upgrades on instances from time to time. If you have just one instance in a cloud service/role, then from time to time they will do maintenance and during that time it will be dead.
I have confirmed this with their support.
The only way to get around this that I know of is to create two instances. Then Microsoft guarantees ~99% availability.
Of course I also confirmed with them that this means twice the cost. =/
If that's not the issue I would enable RDP and get onto the machine to see what the problem is. Microsoft has these tools to help debug problems: http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx
First, you should always run multiple instances of your web role with more than 1 upgrade domain. This is configurable in the service definition (CSDEF). Without this, you don't get an SLA from Microsoft, so you can't really complain that the VMs go down.
Second, to figure out what might be going on with these boxes, you should have both logs (my preference is to roll my own with page blobs or table storage), AND you should always have RDP access to a pre-production environment (production as well if you're not too fussed about security). Once on the box, look through the event viewer for errors.
Third, when an outage occurs check out the azure service dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) for outages.
Lastly, contact Microsoft support. It may take a few hours, but they are pretty good.
That it is happening repeatedly and for extended periods of time (more than 5 minutes), I would be there's something wrong with your hosted service. Again, RDP in and poke around. Good luck.
To debug your sites try to enable diagnostic logs:
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics-logging-and-instrumentation/
Another nice way to look around your site is using the debug console:
https://github.com/projectkudu/kudu/wiki/Kudu-console

Windows Azure: Unexpected & unclean virtual-machine shutdown

Using a large instance of a virtual machine on Windows Azure. The instance runs Microsoft SQL 2012 with light usage, on Windows Server 2012 + all up to date. No user is logged in at time of failures.
However, several (between none and three) times a day (appears random), the VM halts and shuts down. It does not come back online until someone logs back into the Management Portal and starts the VM again. There is no memory dump created. So I am guessing the host halts the running VM, rather than some configuration instance within the guest OS causes the halt. The subscription has billable funds. Other VMs in the subscription are also affected.
Only event logs generated:
Kernel-Power logged:
The system has rebooted without cleanly shutting down first. This
error could be caused if the system stopped responding, crashed, or
lost power unexpectedly.
Kernel-Boot logged:
The last shutdown's success status was false. The last boot's success
status was true.
How can this be resolved? There is no way to initiate a support request within Azure.
The first point I would do is install some monitoring software like newrelic or foglight and see if you can see if you are running out memory or a process is pushing the CPU into a spin.
This will give you some visibility of the activity on the box over time and give you some evidence should you need it to open a support request.
Azure now has paid support only
http://www.windowsazure.com/en-us/support/plans/
We use developer for exactly this type of situation where you are bit lost to figuring out a situation the cost of $30 dollars compared to running a SQL Server 2012 VM per month makes it worth having. The support under Microsoft are generally very good and they will have more diagnostic information and will be able to give you the heads up if this is because of Azure failure or something else.
Getting diagnostic going though would be first port of call then you can see what is going on and get some evidence together and help you track down the problem.

Azure server not responding but dashboard reporting everything is running?

Very suddenly without any changes or recent access my Azure virtual server is no longer available for RDP or web...I have logged into the azure control panel and everything appears to running without issue but it is not working.
I have checked the end points and they are present for both RDP and Web, totally weird.
I have 2 virtual servers and the other one is working fine and responding.
Anyone ever experience this? Just when my client wants to view his website as well...
http://cn-web-02.cloudapp.net is the URL
TIA
As I just answered for this question, Virtual Machines are in Preview and not in Production yet. There are several reasons why your Virtual Machines became unavailable (see other answer). Given that this is the second reported incident here today, it's a good guess it's related to the underlying Host OS being updated, which would take your Virtual Machine offline for a short period of time.
I tried your URL and it's available again. Just remember about this being in Preview, especially since you mention having a client that wants to view his website. If you put a production website in Virtual Machines, then you'll have to absorb the risk of not having an SLA.
Having said that: You can mitigate downtime risk by running two Virtual Machines, listening on a load-balanced input endpoint. Be sure to have both Virtual Machines in the same Availability Set. Doing that ensures that the Windows Azure fabric controller will not take both Virtual Machines offline at the same time when doing things like Host OS updates. If this were in Production, you'd then have a very high availability scenario. Even in Preview, you'll improve availability by taking advantage of Availability Sets. Note: You'll need to use some type of shared session cache, since visitors will now be sent to either one of your Virtual Machines.
I had same experience on it! We had 2 instances and all of its were re-imaged without any notified. I known it since we made some local change via RDP.
Reboot or Reimage may help! You may try!
Turns out it was an outage from Microsoft...for over 22 hours but everything is back up and running. This is the 2nd time in 6 months this has happened for long stretches...makes me a little nervous to say the least.
Thanks for the input everyone and for anyone that's interested MS have a good site that tracks the service levels on Azure. Windows Azure Service Dashboard
S

Web Site Availability in Windows Azure [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
App pool timeout for azure web sites
I am working on an asp.net mvc 4 app that is hosted in Windows Azure. This app will not have a lot of traffic as people will intermittently (once an hour) use it. I wanted to try using Windows Azure.
My app is currently set to use the FREE web site mode. I noticed that after 30 minutes, the site takes a long-time (> 5 seconds) to load. After that initial load, its fast. Then, if someone doesn't use it for another 30 minutes, it takes >5 seconds to load again.
I then tried upping the web site mode to a SHARED instance. I experienced the same problem there.
I then tried upping the web site mode to a RESERVED instance. The problem then goes away.
While I'd like to use Windows Azure, paying $50+ a month for a RESERVED instance is pretty expensive for a site that few have used up to this point. However, I can't have the initial lag. That will just defer the few users I have. You could say you get what you pay for. At the same time, I have a hard time believing others are experiencing this problem and not complaining. There has to be something I'm missing.
I figure the problem has to deal with the application pool resetting. However, I can't seem to figure a way around this. Is anyone familiar with this issue? Is there a way to fix it on a FREE or SHARED instance?
Thank you!
This is expected behavior based on how Windows Azure Web Sites work. The app pool they live in is spun up "on demand" and then hangs around for a time period.
For a detailed (and shameless plug) you can check out my article on this: http://www.simple-talk.com/dotnet/.net-framework/windows-azure-websites-%e2%80%93-a-new-hosting-model-for-windows-azure/
In summary:
Web Sites are hosted in a process on a farm of machines running IIS. If a site is idle for some time then the process is torn down automatically. Also, if the box is seeing a lot of pressure due to the other sites on the box the idle timeout may come down quite a bit (even as low as five minutes). When the next call comes in you'll see the process spun up again (likely on a completely different server). This is because you are in a shared environment (and is similar to how Heroku works). Once you move to reserved then you are the ONLY person on that virtual machine and if you suffer from noisy neighbor issues in processing its' because of your own stuff.
There are ways to keep your site "up", such as having a job that pings the url frequently; however, given that the idle timeout is somewhat fluid it may not solve every case. You can check out a recent post by Sandrino on how to use Azure Mobile Services as a job scheduler: http://fabriccontroller.net/blog/posts/job-scheduling-in-windows-azure/ . There are also 3rd party services available that can do the ping for you automatically.
To be honest, the web sites are a great feature for quick development and test, or even relatively low traffic sites as you are talking about. If you need a high level of uptime and better performance then you'll want to look at Reserved, or another option if the cost isn't in line with expectations.
This isn't an Azure problem. It is a "feature" of any web site hosted in IIS. The default time-out for app pools is 20 minutes. Read about App Pool timeouts here - http://technet.microsoft.com/en-us/library/cc771956(v=ws.10).aspx - one method is to create a keep alive page and ping the page every 10 minutes or so.

Resources