Azure Virtual Machines stuck in Starting mode - azure

This morning I found 5 of my Azure Virtual machines to be stuck in Starting mode.
All other VMs are working ok.
I managed to stop the VMs using the Azure command shell and then start them again but they are still stuck in starting mode with no end in sight.
It has now been over 5 1/2 hours and still stuck in starting mode.
I have contacted Microsoft support but they are taking hours to respond :(((
The Azure Status page doesn't show anything is wrong in my region.
Has anybody else experiencing this problem?

We've had the same issue and it's linked to a big issue Azure is having this morning.
The trick we used in order to get the instance running again is:
1. stop the VMs via Powershell
2. change the size of the vm and back (preferably from A to D as this is different hardware)
3. start the VM
We also have people complaining about RDP not working where reboots fixed the problem.

There are currently some problems with Azure, including the VM service. Also the status page does not reflect all of the problems. Here you have to keep in mind that this page also show impacts affecting most of the service customers. It does not reflect minor outages to single customers. You should keep an eye at the Azure blog which possibly gives a statement related to the current problems.

What works for me is a redeploy of the Virtual Machine within the Azure Portal whenever it gets stuck at "Starting...". Altho it takes half an hour to redeploy, it solves the issue. More details here.

Same problem I experienced and what I did is I resized Virtual Machine's Disk Size, You can go for increasing the whole VM size / power but for me the Disk size fixed it, probably it was updating and the disk file ran out.

Related

Microsoft Azure VM Metric suddenly stop

Suddenly I stopped receiving CPU and the other normal metric on 2 of my VMs and therefore the scale function wasn't operating. Did something change that I need to enable something for this to work? I went onto portal and turned on Diagnosticians including Basic metrics.
I've seen this happen multiple times as well, the availability of the metrics is just not very reliable. They usually come back after a while.

Website on Azure Virtual Machine stops responding every day

I have a website (orders.cpidealers.com) running on an Azure Virtual Machine currently configured to Basic, A2 (2 cores, 3.5 GB memory) monitoring 3 endpoints.
Every morning since Tuesday, June 24,
The website has been unavailable (the browser just spins, I don't even get a 401 or any error)
I can't RDP into the virtual machine,
The endpoint status shows a warning triangle (although when I click on the link next to it some say Not Available while others give a time, I'm not sure I know how to translate the endpoint status box).
To resolve the problem, I login to Azure and restart the Virtual Machine. So far, everything seems to work fine for the remainder of the day until I arrive to work in the morning at 7:30 (Mountain Time).
Any suggestions on how to troubleshoot this?
Well, it seems to me like your app somehow manages to hang IIS by wasting resources. Cant tell you more without any data. You should enable some performance counters monitoring and see what is going on.
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-dotnet-use-performance-counters/
http://www.codeproject.com/Articles/303686/Windows-Azure-Diagnostics-Performance-Counters-In
It looks like the system was hanging as Rouen mentioned. From that, we found this article which seems to have resolved the problem: IIS: Web Application hangs periodically needs system reboot
Here is everything my developer did:
I changed a few other things on the server. Set the sql server to never auto close, which should help the performance in the morning, set the gupdate to manual ( we did that together ) and then I found this article, which seems an exact case for our problem so I set the Credentials Manager to automatic and restarted.
IIS: Web Application hangs periodically needs system reboot

Intermittent Microsoft Azure Web Site access failure

I have a number of small MVC apps deployed as Microsoft Windows Azure websites. This has been working for several months.
Yesterday I rolled out a new one, and the deployment was unremarkable, everything worked fine. But a couple of hours later, access to the site was unavailable. The symptoms were that when the browser tried to navigate to the URL for that site, it would try to load for several minutes and then just give up with a completely blank page.
I attempted to stop and restart the site, and it worked once, but the symptoms came back several minutes later. Then I tried to stop and restart, and it didn't work.
I deployed the identical app to three additional URLs. Again, immediately on deployment, they all work fine, however, they fail at some interval in the future. They seem to not all fail at once. Sometimes restarting the site will fix the problem, and sometimes not.
IMPORTANT: If I wait for some period of time, the site may start to work again on its own.
However, deploying four versions of the app so that our users can go to a backup one if the primary one is not working is not optimal.
Any words of wisdom as to how I might go about debugging this?
ADDITIONAL INFO NOV 25, 2013:
When sites are failing, the IIS logs show either 500 or 502 Internal Service Errors. Our own MVC code is never hit, not even app_start.
You can start by checking the logs and remote debugging
http://www.drdobbs.com/windows/azure-sdk-22-supports-visual-studio-2013/240163499
Are the apps working locally?
Might not be the same problem, but from time to time our Azure instances will get the blue question mark of death as a status.
The reason we found out was that Microsoft will do upgrades on instances from time to time. If you have just one instance in a cloud service/role, then from time to time they will do maintenance and during that time it will be dead.
I have confirmed this with their support.
The only way to get around this that I know of is to create two instances. Then Microsoft guarantees ~99% availability.
Of course I also confirmed with them that this means twice the cost. =/
If that's not the issue I would enable RDP and get onto the machine to see what the problem is. Microsoft has these tools to help debug problems: http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx
First, you should always run multiple instances of your web role with more than 1 upgrade domain. This is configurable in the service definition (CSDEF). Without this, you don't get an SLA from Microsoft, so you can't really complain that the VMs go down.
Second, to figure out what might be going on with these boxes, you should have both logs (my preference is to roll my own with page blobs or table storage), AND you should always have RDP access to a pre-production environment (production as well if you're not too fussed about security). Once on the box, look through the event viewer for errors.
Third, when an outage occurs check out the azure service dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) for outages.
Lastly, contact Microsoft support. It may take a few hours, but they are pretty good.
That it is happening repeatedly and for extended periods of time (more than 5 minutes), I would be there's something wrong with your hosted service. Again, RDP in and poke around. Good luck.
To debug your sites try to enable diagnostic logs:
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics-logging-and-instrumentation/
Another nice way to look around your site is using the debug console:
https://github.com/projectkudu/kudu/wiki/Kudu-console

Azure VM - the operation cannot be performed because the virtual machine is faulted

Our VM has been running fine for a while but today it was in a stopped state and every attempt to restart it results in the error "The operation cannot be performed because the virtual machine is faulted."
We have resized the VM a few times as per previous posts on this same subject but it still fails to start. This is a production site so are in a bit of a bind without being able to start the VM. The last attempt to resize the VM has now just left it in a perpetual state of Starting without being able to change configuration settings or update it in any way.
If we delete the VM, can we just create it again and attach the original disk without any data loss?
Any ideas on how to figure out why it won't start?
Resizing the VM typically fixes this problem too. E.g. size from A1 to A2 then (once it's done) back to A1 again.
There is problem on Southeast Asia datacenters.
check it here http://www.windowsazure.com/en-us/support/service-dashboard/

Azure server not responding but dashboard reporting everything is running?

Very suddenly without any changes or recent access my Azure virtual server is no longer available for RDP or web...I have logged into the azure control panel and everything appears to running without issue but it is not working.
I have checked the end points and they are present for both RDP and Web, totally weird.
I have 2 virtual servers and the other one is working fine and responding.
Anyone ever experience this? Just when my client wants to view his website as well...
http://cn-web-02.cloudapp.net is the URL
TIA
As I just answered for this question, Virtual Machines are in Preview and not in Production yet. There are several reasons why your Virtual Machines became unavailable (see other answer). Given that this is the second reported incident here today, it's a good guess it's related to the underlying Host OS being updated, which would take your Virtual Machine offline for a short period of time.
I tried your URL and it's available again. Just remember about this being in Preview, especially since you mention having a client that wants to view his website. If you put a production website in Virtual Machines, then you'll have to absorb the risk of not having an SLA.
Having said that: You can mitigate downtime risk by running two Virtual Machines, listening on a load-balanced input endpoint. Be sure to have both Virtual Machines in the same Availability Set. Doing that ensures that the Windows Azure fabric controller will not take both Virtual Machines offline at the same time when doing things like Host OS updates. If this were in Production, you'd then have a very high availability scenario. Even in Preview, you'll improve availability by taking advantage of Availability Sets. Note: You'll need to use some type of shared session cache, since visitors will now be sent to either one of your Virtual Machines.
I had same experience on it! We had 2 instances and all of its were re-imaged without any notified. I known it since we made some local change via RDP.
Reboot or Reimage may help! You may try!
Turns out it was an outage from Microsoft...for over 22 hours but everything is back up and running. This is the 2nd time in 6 months this has happened for long stretches...makes me a little nervous to say the least.
Thanks for the input everyone and for anyone that's interested MS have a good site that tracks the service levels on Azure. Windows Azure Service Dashboard
S

Resources