I have 2 VM machines on Azure.
One suddenly stopped responding.
It was down for around 30 minutes, until I just browsed into the Azure portal, and then I saw it was in the Starting state, and then it was up & running again.
How can I tell why my VM was shutdown?
EDIT: I'm assuming you're talking about Virtual Machines (IaaS), and not Cloud Services (PaaS).
Virtual Machines can, and will, restart, for several reasons. For example:
Hardware failure, where your Virtual Machine will then be restarted on another server.
Host OS refresh. This is the operating system running the physical server.
Some type of OS crash
Also keep in mind: Virtual Machines are in Preview with no SLA today. So there wouldn't be any information readily available to you for determining why your Virtual Machine became unavailable.
If it was unavailable for 30 minutes, then this hints at something akin to a host OS update or your virtual machine being moved. If it was down for, say, 5 minutes, then I'd guess it was an OS crash.
UPDATE I just looked at the Azure Dashboard which is showing degraded Compute with Virtual Machines (see RSS feed with problem description). Perhaps this is the root cause of your particular outage...
there are several things that might cause this to happen, your VM may have been crashed due to bad coding or bad development, the second reason I think is that the number of VM you created is not enough to the incoming traffic. this could cause your VM to restart if the number of incoming traffic is more than the number of VMs can handle.
Related
Let's imagine I have created an Azure virtual machine, a small one initially. I have installed SQL Server and created databases. Also hosted to website by IIS on the virtual machine.
I can see the performance of the small one is not up to the mark. I want to upgrade to a larger machine more powerful one. I know, I can do this from Azure portal.
My question is since I have already fully configured this machine with databases and websites running on the small VM. I need to know, Will I lose all my data and hosted websites if I change size of Virtual Machine (VM) from Small to large from Azure portal? I am worried that if this upgrade I may lose data and website.
You will not lose your (entire) data when you scale.
Why I put Entire - because your data is on the System drive (C). Which by default (if you have not turned this off) has a Read/Write Host Cache enabled. The Write cache can cause some data corruption when the VM is not gracefully shut down, or while changing the size. And this is the only issue you have to be worried about.
Changing VM size is kind of a common task that everyone does almost on a daily basis, especially when using IaaS as dev/test environment.
It is also a recommended corrective action to take if you are having issues with booting up the VM.
So, go ahead and change the size. You can pre-cautious stop your IIS before resizing, to avoid data loss. This only make sense if your application has some logic which writes files to local (C) drive.
I have a product which uses CPU ID, network MAC, and disk volume serial numbers for validation. Basically when my product is first installed these values are recorded and then when the app is loaded up, these current values are compared against the old ones.
Something very mysterious happened recently. Inside of an Azure VM that had not been restarted in weeks, my app failed to load because some of these values were different. Unfortunately the person who caught the error deleted the VM before it was brought to my attention.
My question is, when an Azure VM is running, what hardware resources may change? Is that even possible?
Thanks!
Answering this requires a short rundown of how Azure works.
In each data centres there are thousands of individual machines. Each machine runs a hypervisor which allows a number of operating systems to share the same underlying hardware.
When you start a role, Azure looks for available resources - disk space CPU RAM etc and boots up a copy of the appropriate OS VM in thoe avaliable resources. I understand from your question that this is a VM role - so this VM is the one you uploaded or created.
As long as your VM is running, the underlying virtual resources provided by the hypervisor are not likely to change. (the caveat to this is that windows server 2012's hyper visor can move virtual machines around over the network even while they are running. Whether azure takes advantage of this, I don't know)
Now, Azure keeps charging you for even when your role has stopped because it considers your role "deployed". So in theory, those underlying resources still "belong" to your role.
This is not guaranteed. Azure could decided to boot up your VM on a different set of virtualized hardware for any number of reasons - hardware failure being at the top of the list, with insufficient capacity being second.
It is even possible (tho unlikely) for your resources to be provided by different hardware nodes.
An additional point of consideration is that it is Azure policy that disaster recovery (or other major event) may include transferring your roles to run in a separate data centre entirely.
My point is that the underlying hardware is virtual and treating it otherwise is most unwise. Roles are at the mercy of the Azure Management Routines, and we can't predict in advance what decisions they may make.
So the answer to your question is that ALL of the underlying resources may change. And it is very, very possible.
Using a large instance of a virtual machine on Windows Azure. The instance runs Microsoft SQL 2012 with light usage, on Windows Server 2012 + all up to date. No user is logged in at time of failures.
However, several (between none and three) times a day (appears random), the VM halts and shuts down. It does not come back online until someone logs back into the Management Portal and starts the VM again. There is no memory dump created. So I am guessing the host halts the running VM, rather than some configuration instance within the guest OS causes the halt. The subscription has billable funds. Other VMs in the subscription are also affected.
Only event logs generated:
Kernel-Power logged:
The system has rebooted without cleanly shutting down first. This
error could be caused if the system stopped responding, crashed, or
lost power unexpectedly.
Kernel-Boot logged:
The last shutdown's success status was false. The last boot's success
status was true.
How can this be resolved? There is no way to initiate a support request within Azure.
The first point I would do is install some monitoring software like newrelic or foglight and see if you can see if you are running out memory or a process is pushing the CPU into a spin.
This will give you some visibility of the activity on the box over time and give you some evidence should you need it to open a support request.
Azure now has paid support only
http://www.windowsazure.com/en-us/support/plans/
We use developer for exactly this type of situation where you are bit lost to figuring out a situation the cost of $30 dollars compared to running a SQL Server 2012 VM per month makes it worth having. The support under Microsoft are generally very good and they will have more diagnostic information and will be able to give you the heads up if this is because of Azure failure or something else.
Getting diagnostic going though would be first port of call then you can see what is going on and get some evidence together and help you track down the problem.
Very suddenly without any changes or recent access my Azure virtual server is no longer available for RDP or web...I have logged into the azure control panel and everything appears to running without issue but it is not working.
I have checked the end points and they are present for both RDP and Web, totally weird.
I have 2 virtual servers and the other one is working fine and responding.
Anyone ever experience this? Just when my client wants to view his website as well...
http://cn-web-02.cloudapp.net is the URL
TIA
As I just answered for this question, Virtual Machines are in Preview and not in Production yet. There are several reasons why your Virtual Machines became unavailable (see other answer). Given that this is the second reported incident here today, it's a good guess it's related to the underlying Host OS being updated, which would take your Virtual Machine offline for a short period of time.
I tried your URL and it's available again. Just remember about this being in Preview, especially since you mention having a client that wants to view his website. If you put a production website in Virtual Machines, then you'll have to absorb the risk of not having an SLA.
Having said that: You can mitigate downtime risk by running two Virtual Machines, listening on a load-balanced input endpoint. Be sure to have both Virtual Machines in the same Availability Set. Doing that ensures that the Windows Azure fabric controller will not take both Virtual Machines offline at the same time when doing things like Host OS updates. If this were in Production, you'd then have a very high availability scenario. Even in Preview, you'll improve availability by taking advantage of Availability Sets. Note: You'll need to use some type of shared session cache, since visitors will now be sent to either one of your Virtual Machines.
I had same experience on it! We had 2 instances and all of its were re-imaged without any notified. I known it since we made some local change via RDP.
Reboot or Reimage may help! You may try!
Turns out it was an outage from Microsoft...for over 22 hours but everything is back up and running. This is the 2nd time in 6 months this has happened for long stretches...makes me a little nervous to say the least.
Thanks for the input everyone and for anyone that's interested MS have a good site that tracks the service levels on Azure. Windows Azure Service Dashboard
S
We know that Azure Compute is PaaS so the Operating System (Windows Server 2008 R2) has to be patched and upgraded automatically.
I just wanted to know will there be any downtime during the patching or Compute upgradation...?
If you only have a single instance of a particular VM role, then yes - you'll have a short bit of downtime, as you need to be rebooted. Likewise, if the host OS is patched, you'll have a bit of downtime.
If you run two or more instances, then the SLA kicks in, because your instances are separated into different containers/network branches/etc. These are fault domains. So even if a network segment, router, or entire rack were to go offline, you'd have another instance somewhere else.
During OS updates, your instances are divided into upgrade domains, so that they're not all upgraded at once. This leaves your service in an always-available state, as long as you have two or more instances of your roles. For background processes that aren't customer-facing (say, in a worker role that simply reads from queues and processes queue items asynchronously), you can probably get away with a single instance of that role, provided you can handle the work load and that it would be ok to have occasional processing delays.
See this recent TechNet blog post for more details around fault domains and upgrade domains.