Attaching a new disk to Azure IAAS VM caused a reboot? - azure

I recently attached a new empty data disk to an Azure IAAS VM running Windows Server 2012 datacenter. As soon as the disk was added. Windows rebooted. This surprised me greatly, as I didn't expect added a data disk to cause a reboot. I looked in the event log and didn't see any errors, the event log indicated NT AUTHORTIY\SYSTEM initiated the reboot.
I attached another disk after it came back up, and it behaved as expected, the disk was added without a reboot.
Does anyone know why/what circumstances would cause an operation like that to make the system trigger a reboot?
Thank you!

This isn't expected behaviour, and isn't anything I've seen before.
I've added numerous data disks to numerous IaaS VMs on Azure, in both Windows 2008 R2 and Windows 2012 R2, and I've never seen a VM reboot automatically as a result of that action.
Is there any chance you were doing anything else at same time which may have caused the reboot, such as a silent software installation? Alternatively you might have been particularly unfortunate if it coincided with a scheduled Azure maintenance window (although you would normally see something in the Event Log if that was the case).

Related

What can cause an Azure Cloud Service's Disk Write/Read to spike unexpectedly?

We have a simple worker that picks up messages from a queue and runs a few queries. We don't ever write to the disk ourselves but we do have diagnostics turned on in the roll settings.
Once in a while the the disk write/read spikes and the worker becomes unresponsive. What is the role trying to write to the disk? On the surface it doesn't appear to be a crash dump because those tables and blobs are still empty. Are our diagnostics configured improperly?
Here's an example of a spike we saw recently. It was writing for over an hour!
Try enabling remote desktop support in the role configuration in the Azure portal.
Once the problem resurfaces, log in via RDP and start Resource Monitor. The Disk tab should be able to pinpoint disk IO usage by process and by file.
Enabling storage logs should tell you exactly what are those reads and writes on the disk.
So, this is a very open ended question and is very hard to predict. Your Cloud Services are ultimately Windows machines and what's happening on Windows can (usually) only be monitored by something inside Windows.
It is very possible that a Windows Update related task was running. Those may cause spikes in disk R/W
We typically advice users who use CloudMonix and want to know what causes CPU/Memory/other issues to install CloudMonix agent on their machines as it captures running processes, their memory and CPU utilizations and can show process that caused a spike. Usually spikes in disk R/W are correlated to spikes in CPU usage.
Note, if the spike was caused by your own code, you'll need to use profilers such as RedGate's ANTS performance profiler or Jetbrains dotTrace or some such to determine the ultimate root cause.
HTH

Azure Virtual Machines stuck in Starting mode

This morning I found 5 of my Azure Virtual machines to be stuck in Starting mode.
All other VMs are working ok.
I managed to stop the VMs using the Azure command shell and then start them again but they are still stuck in starting mode with no end in sight.
It has now been over 5 1/2 hours and still stuck in starting mode.
I have contacted Microsoft support but they are taking hours to respond :(((
The Azure Status page doesn't show anything is wrong in my region.
Has anybody else experiencing this problem?
We've had the same issue and it's linked to a big issue Azure is having this morning.
The trick we used in order to get the instance running again is:
1. stop the VMs via Powershell
2. change the size of the vm and back (preferably from A to D as this is different hardware)
3. start the VM
We also have people complaining about RDP not working where reboots fixed the problem.
There are currently some problems with Azure, including the VM service. Also the status page does not reflect all of the problems. Here you have to keep in mind that this page also show impacts affecting most of the service customers. It does not reflect minor outages to single customers. You should keep an eye at the Azure blog which possibly gives a statement related to the current problems.
What works for me is a redeploy of the Virtual Machine within the Azure Portal whenever it gets stuck at "Starting...". Altho it takes half an hour to redeploy, it solves the issue. More details here.
Same problem I experienced and what I did is I resized Virtual Machine's Disk Size, You can go for increasing the whole VM size / power but for me the Disk size fixed it, probably it was updating and the disk file ran out.

How do I cause Azure move my instance if I decide the VM is faulty?

Suppose I have the following situation. One of my Azure role instances happens to be started on a VM that runs inside a faulty server but Azure wiring processes don't see any problems. I somehow deduce this fact - for example I see an "impossible" call stack - one that can't happen in my program under any normal conditions.
So I'd like Azure to move my instance to another VM and have the underlying hardware checked and repaired.
How can I do that except contacting support?
A few comments:
You can have this done, sortof, by calling support. The support team won't move your VM to a new server just because you ask, but they will work with you to determine if the physical server really is bad, and if so move it to out of service.
RequestRecycle will only shut down the host process (ie. WaIISHost) and related processes and then restart them. It won't reboot the VM, clean boot, or redeploy.
You can try a 'Reimage' from the portal or Powershell if you suspect you might have a corrupt Windows installation. A Reimage will recreate the Windows partition from scratch.
In order to force a new VM to be on a new server you would have to do an in-place upgrade and modify the size of the VM (ie. go from Small to Medium). This will cause new VMs to be created on new servers. You can then do another in-place upgrade to revert back to the original size.
That being said, I strongly agree with Brian's comment that it is very unlikely that bad hardware is causing an 'impossible' callstack. I would recommend opening a support incident so you can find the actual root cause instead of just fixing the most visible symptom.
I don't think you can move a VM. But you could create a new staging deployment, swap it into production, and then destroy the old one. You can't actually guarantee that the VMs are on different physical machines, but it seems reasonably likely. The larger the VMs are, the more likely that they're on separate servers.
That said, it seems really unlikely that your problems are due to a hardware fault rather than some subtle bug.

Windows Azure: Unexpected & unclean virtual-machine shutdown

Using a large instance of a virtual machine on Windows Azure. The instance runs Microsoft SQL 2012 with light usage, on Windows Server 2012 + all up to date. No user is logged in at time of failures.
However, several (between none and three) times a day (appears random), the VM halts and shuts down. It does not come back online until someone logs back into the Management Portal and starts the VM again. There is no memory dump created. So I am guessing the host halts the running VM, rather than some configuration instance within the guest OS causes the halt. The subscription has billable funds. Other VMs in the subscription are also affected.
Only event logs generated:
Kernel-Power logged:
The system has rebooted without cleanly shutting down first. This
error could be caused if the system stopped responding, crashed, or
lost power unexpectedly.
Kernel-Boot logged:
The last shutdown's success status was false. The last boot's success
status was true.
How can this be resolved? There is no way to initiate a support request within Azure.
The first point I would do is install some monitoring software like newrelic or foglight and see if you can see if you are running out memory or a process is pushing the CPU into a spin.
This will give you some visibility of the activity on the box over time and give you some evidence should you need it to open a support request.
Azure now has paid support only
http://www.windowsazure.com/en-us/support/plans/
We use developer for exactly this type of situation where you are bit lost to figuring out a situation the cost of $30 dollars compared to running a SQL Server 2012 VM per month makes it worth having. The support under Microsoft are generally very good and they will have more diagnostic information and will be able to give you the heads up if this is because of Azure failure or something else.
Getting diagnostic going though would be first port of call then you can see what is going on and get some evidence together and help you track down the problem.

Azure VM shutdown

I have 2 VM machines on Azure.
One suddenly stopped responding.
It was down for around 30 minutes, until I just browsed into the Azure portal, and then I saw it was in the Starting state, and then it was up & running again.
How can I tell why my VM was shutdown?
EDIT: I'm assuming you're talking about Virtual Machines (IaaS), and not Cloud Services (PaaS).
Virtual Machines can, and will, restart, for several reasons. For example:
Hardware failure, where your Virtual Machine will then be restarted on another server.
Host OS refresh. This is the operating system running the physical server.
Some type of OS crash
Also keep in mind: Virtual Machines are in Preview with no SLA today. So there wouldn't be any information readily available to you for determining why your Virtual Machine became unavailable.
If it was unavailable for 30 minutes, then this hints at something akin to a host OS update or your virtual machine being moved. If it was down for, say, 5 minutes, then I'd guess it was an OS crash.
UPDATE I just looked at the Azure Dashboard which is showing degraded Compute with Virtual Machines (see RSS feed with problem description). Perhaps this is the root cause of your particular outage...
there are several things that might cause this to happen, your VM may have been crashed due to bad coding or bad development, the second reason I think is that the number of VM you created is not enough to the incoming traffic. this could cause your VM to restart if the number of incoming traffic is more than the number of VMs can handle.

Resources