Availability set azure fault domain & update domain - azure

Q.I have 2 servers. So i ll have 2 FD(FD0, FD1) & 2 UD(UD0, UD1). What if UD0 is down & at the same time the FD1 goes down due some reason. So what will happen ?

If I co-relate the actual question and diagram in the Ashok's answer,
There are two scenarios here,
1) Update Domains will be down only if there is any updates going on(it can be planned or unplanned). So in case if FD1 goes down there won't be any updates happening in UD0 as because there are no other servers to take the load. Till FD1 comes online UD0 will have to wait to do the update.
2) In case any updates going on in UD1 or UD2 definitely UD0 will be running and serving the load/handling the traffic. At that time, if FD0 goes down, then your app will be down. To overcome this scenario you should have 3 FDs.

Very simple: both of your servers would be out.
It's not even related to Azure here, even if you have 2 machines, hosted in two locations, by 2 different providers, and the first is down for maintenance, and the second one crashes, you'll end up with everything down. So, fault domains and update domains will not protect you from a full outage in such an event.
This is how FDs and UDs are useful in the case of two machines:
Having each machine in its own FD and its own UD allows you to avoid a full outage in the event of an unexpected outage in one FD and avoid full outage in the event of an update
Having both machines in the same FD but in different UDs allows you to avoid full outage during update operations, but does not prevent full outage in the event of an unexpected FD outage
Having both machines in the same UD, but in different FDs (yes it's possible) allows you to avoid full outage in the event of an unexpected outage in one FD, but you'll have full outage for each update operation
Having both machines in the same FD and in the same UD would not protect you from anything, you'll have a full outage for both unexpected FD outages and update outages

For all Virtual Machines that have two or more instances deployed in the same Availability Set, Microsoft guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.
For any Single Instance Virtual Machine using premium storage for all Operating System Disks and Data Disks, Microsoft guarantee you will have Virtual Machine Connectivity of at least 99.9%.
Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.
Fault domains define the group of virtual machines that share a common power source and network switch. By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.
Here is an article which helps you to understand Fault Domains and Update Domains

Related

Do you know of a good explanation for fault domains and update domains?

I'm new to Azure and have been struggling with a concept specifically update domains and fault domains. Probably having a harder time understanding Update Domains.
So as I understand it, having 3 VMs in 3 fault domains would be essentially having those VMs spread out to three racks? Is that correct?
Like this
Fault domain 1
Fault domain 2
Fault domain 3
VM 1
VM 2
VM 3
If that is wrong, please correct me. So then what is an update domain? A lot of the documentation I have seen shows a demonstration for the fault domain similar to the table above and will describe what kind of sounds like the fault domain.
If you have a link to a good explanation that would be a big help or if you think you could dumb it down for me a bit, that would work too.
Each virtual machine in your availability set has an update domain and fault domain assigned.
Fault domains indicate the group of virtual machines that share common power source and network switch limiting the impact of potential physical hardware failures, network outages, or power interruptions.
Update domains indicate the group of virtual machines and underlying physical hardware that can be rebooted at the same time ensuring availability of some virtual machines during a planned maintenance.
Link: https://learn.microsoft.com/bs-latn-ba/azure/virtual-machines/availability-set-overview#how-do-availability-sets-work

Azure VM - Update Domain and Fault Domain in Avaibility set

I am reading about availability set in Azure Virtual Machine. And it seems a bit confusing to me.
I have few questions, would appreciate if someone can answer that.
Microsoft document says with two or more machines in availability set gives 99.95% availability. if this is the case why they have 3 maximum Fault Domain and 20 maximum Update domain. if I choose a max of both would I get more availability than 99.95%? if not, what is the purpose of having more updates or fault domains than 2?
If I have 3 fault domain and 20 update domain, how many physical machines will be created? 20 max(update_domain, fault_domain) or 23 (update_domain + fault_domain)
can it be possible to have less number of update domain than fault domain? i.e. 2?
no, you would not get a higher SLA, but in theory the more FD\UD you have the more reliable your VMs are (and its free of charge, so no point in having less, tbh)
physical machines are not going to be created (how??), but your vms would be split along 20 hypervisors and those hypervisors will be split between 3 fault domains (racks)
I dont see why not, but I dont see why would you want to do that.

Azure changing hardware

I have a product which uses CPU ID, network MAC, and disk volume serial numbers for validation. Basically when my product is first installed these values are recorded and then when the app is loaded up, these current values are compared against the old ones.
Something very mysterious happened recently. Inside of an Azure VM that had not been restarted in weeks, my app failed to load because some of these values were different. Unfortunately the person who caught the error deleted the VM before it was brought to my attention.
My question is, when an Azure VM is running, what hardware resources may change? Is that even possible?
Thanks!
Answering this requires a short rundown of how Azure works.
In each data centres there are thousands of individual machines. Each machine runs a hypervisor which allows a number of operating systems to share the same underlying hardware.
When you start a role, Azure looks for available resources - disk space CPU RAM etc and boots up a copy of the appropriate OS VM in thoe avaliable resources. I understand from your question that this is a VM role - so this VM is the one you uploaded or created.
As long as your VM is running, the underlying virtual resources provided by the hypervisor are not likely to change. (the caveat to this is that windows server 2012's hyper visor can move virtual machines around over the network even while they are running. Whether azure takes advantage of this, I don't know)
Now, Azure keeps charging you for even when your role has stopped because it considers your role "deployed". So in theory, those underlying resources still "belong" to your role.
This is not guaranteed. Azure could decided to boot up your VM on a different set of virtualized hardware for any number of reasons - hardware failure being at the top of the list, with insufficient capacity being second.
It is even possible (tho unlikely) for your resources to be provided by different hardware nodes.
An additional point of consideration is that it is Azure policy that disaster recovery (or other major event) may include transferring your roles to run in a separate data centre entirely.
My point is that the underlying hardware is virtual and treating it otherwise is most unwise. Roles are at the mercy of the Azure Management Routines, and we can't predict in advance what decisions they may make.
So the answer to your question is that ALL of the underlying resources may change. And it is very, very possible.

Azure VM shutdown

I have 2 VM machines on Azure.
One suddenly stopped responding.
It was down for around 30 minutes, until I just browsed into the Azure portal, and then I saw it was in the Starting state, and then it was up & running again.
How can I tell why my VM was shutdown?
EDIT: I'm assuming you're talking about Virtual Machines (IaaS), and not Cloud Services (PaaS).
Virtual Machines can, and will, restart, for several reasons. For example:
Hardware failure, where your Virtual Machine will then be restarted on another server.
Host OS refresh. This is the operating system running the physical server.
Some type of OS crash
Also keep in mind: Virtual Machines are in Preview with no SLA today. So there wouldn't be any information readily available to you for determining why your Virtual Machine became unavailable.
If it was unavailable for 30 minutes, then this hints at something akin to a host OS update or your virtual machine being moved. If it was down for, say, 5 minutes, then I'd guess it was an OS crash.
UPDATE I just looked at the Azure Dashboard which is showing degraded Compute with Virtual Machines (see RSS feed with problem description). Perhaps this is the root cause of your particular outage...
there are several things that might cause this to happen, your VM may have been crashed due to bad coding or bad development, the second reason I think is that the number of VM you created is not enough to the incoming traffic. this could cause your VM to restart if the number of incoming traffic is more than the number of VMs can handle.

Azure Compute : OS Patching, Updates and Downtime

We know that Azure Compute is PaaS so the Operating System (Windows Server 2008 R2) has to be patched and upgraded automatically.
I just wanted to know will there be any downtime during the patching or Compute upgradation...?
If you only have a single instance of a particular VM role, then yes - you'll have a short bit of downtime, as you need to be rebooted. Likewise, if the host OS is patched, you'll have a bit of downtime.
If you run two or more instances, then the SLA kicks in, because your instances are separated into different containers/network branches/etc. These are fault domains. So even if a network segment, router, or entire rack were to go offline, you'd have another instance somewhere else.
During OS updates, your instances are divided into upgrade domains, so that they're not all upgraded at once. This leaves your service in an always-available state, as long as you have two or more instances of your roles. For background processes that aren't customer-facing (say, in a worker role that simply reads from queues and processes queue items asynchronously), you can probably get away with a single instance of that role, provided you can handle the work load and that it would be ok to have occasional processing delays.
See this recent TechNet blog post for more details around fault domains and upgrade domains.

Resources