I have a huge problem in a production VM I was hosting on Azure infrastructure (size, in case it matters is: Standard DS14 (16 Cores, 112 GB memory))
The entire production server has gone offline, stopped reporting to New Relic, and I can't SSH into it either (it times out). I tried restarting the machine from the portal, but got the message:
The operation '7922f9ed80af7b2c9a9ec5d13f510393' failed: 'The operation cannot be performed because the virtual machine is faulted.'.
What can I do? This is quite a bind!
p.s. https://azure.microsoft.com/en-us/status/ shows all is well
Your issue happens on VM's host has fault.
Restart or create a new VM could solve this issue. Resizing the VM moves the VM to a different host server, so if there is an issue with the host that is impacting a VM, resizing it will get it off that host.
You could check whether your VMs in an availability set. An availability set is a group of virtual machines that are deployed across fault domains and update domains. An availability set makes sure that your application is not affected by single points of failure, like the network switch or the power unit of a rack of servers.
Related
We have a service with low SLA requirements, so we host it on a single VM, no need for multiple VMs in availability set/zones.
What happens if there is a zone or fault domain failure?
Will Azure automatically reallocate the VM to an operational zone / host (FD), or we have to actively restart or redeploy the VM in order to reallocate it?
From the document, when facing an unexpected Downtime, Azure will migrate your VM to a healthy physical machine in the same datacenter.
When detected, the Azure platform automatically migrates (heals) your
virtual machine to a healthy physical machine in the same datacenter.
During the healing procedure, virtual machines experience downtime
(reboot) and in some cases loss of the temporary drive. The attached
OS and data disks are always preserved.
However, if you are using a single VM, it's recommended to use Standard SSD wither higher SLA.
A single instance virtual machine with a Standard SSD will have an SLA
of at least 99.5%, while a single instance virtual machine with a
Standard HDD will have an SLA of at least 95%. See SLA for Virtual
Machines.
i have several Linux Vms in MS Azure within the same security group and i can access all of them over SSH expect one. Here i need to restart the VM 5 to 10 times before i also can access it via SSH.
anyone has an idea whats wrong with this VM?
If the problem seems specific to this VM alone, you might want to check the VM's Resource Health first. Ensure that the VM reports as being healthy. If you have boot diagnostics enabled, verify the VM is not reporting boot errors in the logs.
If that looks clean, you might consider redeploying the VM. This redeploys a VM to another node within Azure, which may correct any underlying networking issues.
Do note that post this operation completes, ephemeral disk data is lost and dynamic IP addresses that are associated with the virtual machine are updated.
Additional troubleshooting guidance can be found here.
I have a VM that runs IIS and SQL server for an enterprise application used by around 100 users.
Right now I just have this VM but I would like to add some availability. It’s not critical to have zero downtime application but at least that if by some reason the server fails then I’m able to wake up a secondary instance and reroute traffic to it.
So I guess this is done by using Availabilty Sets but what I understand is that I have at least to have two VMs in the availability set and load balancer so traffic is redirected round robin to each VM. By using the above approach that means that I must have to pay for having two instances with same specs I guess.
What I would like and don’t know if this is possible is like having same above scenario where one the of the VMs is stopped so I don’t get any charge and in case of VM failure I can started maybe manually so the application works again. If this is possible how does the hard drive is available so that the other VM always have the latest data.
If it’s not possible then can I have then for the availabilty set a second VM with the lowest specs that my app can support so if the main VM fails at least critical users can still access the app (maybe performance won’t be great but app will work) and when main VM is functional again then main traffic is again redirected to main VM.
you can achieve this by having 2 vms with premium disks only and having one as a cold backup. single vm qualify for an SLA if they only use premium disks, SLA would be 99.9% afair.
with AV sets - you need to have at least 2 running vms.
Following on from the latest Azure maintenance, we cannot remote desktop to one of our VMs and fix potential issue on the IIS server of this machine. Everything was working fine for over 1 year.
The Agent status is now set to "Not Ready" when looking at the properties of the VM in the Azure portal.
We obviously tried to restart the machine but no effect. We cannot redeploy the machine to another node as the VM agent seems to be down.
The outbound NSG rules do not block outbound connection to internet (so that the machine should be able to write to its azure storage).
This user seems to have a similar issue on a VM scale set: Azure VM scale sets not accessible and cannot restart
Any idea on how to resolve this issue ?
I am trying to migrate my applications from GoDaddy virtual machine to Azure Virtual machine.
I want to have 1 VM as a database server and another VM as webserver, but it works extremely slow in that case, I installed DB on the same server as web application and it's fast.
So the question: How can I increase performance between two virtual machines? Both are located in "East US", is it some way to may be locate both in the same box...?
What can you suggest?
Found the issue. It's not Azure, it's problem with MySQL database.
I have to set parameter
skip-name-resolve
in my.ini file, and not it's as fast as when both located on the same server.
Make sure you have the VM's in the same affinity group so the machines are geolocated together by the Fabric Controller i.e. in East US
Machines in the sames Cloud Service and affinity group should have lightening direct network traffic, can you run some ping tests between them on the same affinity group and share with us plz.
e.g. RDP in and run ping from the command line.
As stated above the beefer the machines the better the bandwidth/network traffic, scaling up the SQL machine is advisable.
*More on Affinity Groups here: http://convective.wordpress.com/2012/06/10/affinity-groups-in-windows-azure/
Cheers
Affinity Group is the way to go. Configure both VM in the same affinity group will co-locate those two VMs.