Fault domain and update domain in Azure - azure

I would like to know some simple explanation to understand about Fault domain and update domain in Azure.
Most of the theory is little confusing to me. So the below picture is what that i understand if we make it as a cluster environment to update the Hypervisor?
enter image description here

Group of VMs that share common hardware are said to be in the same fault domain.
During planned maintenance by Microsoft VMs are required to reboot to complete the update.Group of VMs and underlying Hardware that are rebooted at the same time are said to be in the same update domain.

Related

Azure Domain broken

everyone.
I recently bought a domain on azure and I can't bind it to an app service (web app).
When I try to bind a domain it says "App Service Domain is in a broken state. Please navigate to the App Service Domain resource and delegate to Azure DNS before adding hostname."
When navigating to the domain at https://resources.azure.com, it looks my dnsZoneId is assigned to another resource group and I don't know how to change it.
I tried to delete the DNS zone and recreate it but I can't bind the dns back from https://dcc.secureserver.net
Can Anyone help me please?
Thanks in advance
Newest
About how to change your dnsZoneId to another resource group, you can read this post. I think it useful to you.
PRIVIOUS
Under normal circumstances, this problem does not occur, because when you successfully purchase a domain in azure, all information and services of the domain name are hosted on azure.
There is a similar case here, you can refer to it. May be helpful to you.
If there is a problem, I guess the reason is:
Some of your misoperations may cause the domain name service or settings to be configured incorrectly, making it unusable. (This probability is relatively low)
It may be the domain management service of azure, there may be a problem in your region. It may take a while to try again, or transfer the domain name to godaddy for management. (The reason for this is because I have encountered it in godaddy before, but it was solved after migrating the domain name to Tecent in China)
If the above operations have been tried and cannot be solved, please raise a ticket in the portal for help.

Fault domain of Virtual Machines in an availability set

I read the following in the AZURE214x Azure Fundamentals openedx course:
Each virtual machine in an availability set is placed in one update
domain and two fault domains.
I don't understand why a VM in an availabilty set is placed in two fault domains? I mean a VM can only sit in one fault domain or am I wrong? Can someone explain?
Each virtual machine in an availability set is placed in one update
domain and two fault domains.
No, this is wrong.
Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform.
Also, you could refer to Mistake in Module3, Review question 2 (AZURE202x Microsoft Azure Virtual Machines).
I guess its just a wording thing, VM cannot be in 2 fault domains at the same time.
"Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform."
Reference: https://learn.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-manage-availability

automatic failover if webserver is down (SRV / additional A-record / ?)

I am starting to develop a webservice that will be hosted in the cloud but needs higher availability than typical cloud SLAs provide.
Typical SLAs, e.g. Windows Azure, promise an availability of 99.9%, i.e. up to 43min downtime per month. I am looking for an order of magnitude better availability (<5min down time per month). While I can configure several load balanced database back-ends to resolve that part of the issue I see a bottleneck at the webserver. If the webserver fails, the whole service is unavailable to the customer. What are the options of reducing that risk without introducing another possible single point of failure? I see the following solutions and drawbacks to each:
SRV-record:
I duplicate the whole infrastructure (and take care that the databases are in sync) and add additional SRV records for the domain so that the user tying to access www.example.com will automatically get forwarded to example.cloud1.com or if that one is offline to example.cloud2.com. Googling around it seems that SRV records are not supported by any major browser, is that true?
second A-record:
Add an additional A-record as alternatives. Drawbacks:
a) at my hosting provider I do not see any possibility to add a second A-record but just one... is that normal?
b)if one server of two servers are down I am not sure if the user gets automatically re-directed to the other one or 50% of all users get a 404 or some other error
Any clues for a best-practice would be appreciated
Cheers,
Sebastian
The availability of the instance i.e. SLA when specified by the Cloud Provider means the "Instance's Health is server running in the context of Hypervisor or Fabric Controller". With that said, you need to take an effort and ensure the instance is not failing because of your app / OS / or pretty much anything running inside the instance. There are few things which devops tend to miss and that kind of hit back hard like for instance - forgetting to configure the OS Updates and Patches.
The fundamental axiom with the availability is the redundancy. More redundant your application / infrastructure is more availabile is your app.
I recommend your to look into the Azure Traffic Manager and then re-work on your architecture. You need not worry about the SRV record or A-Record. Just a CNAME for the traffic manager would do the trick.
The idea of traffic manager is simple, you can tell the traffic
manager to stand after the domain name ( domain name resolution of the
app ) then the traffic manager decides where to send the request on
considerations of factors like Round-Robin, Disaster Management etc.
With the combination of the Traffic Manager and multi-region infrastructure setup; you will march towards the high availability goal.
Links
Azure Traffic Manager Overview
Cloud Power: How to scale Azure Websites globally with Traffic Manager
Maybe You should configure a corosync cluster with DRBD ?
DRBD will ensure You that the data on both nodes are replicated (for example website files and db files).
Apache as web server will be available under a virtual IP to which domain is pointed. In case of one server is down corosync will move all services to second server within few seconds.

Understanding availability set in Windows Azure

I am reading the explanation of Availability Sets on Microsoft' website but can't 100% understand the concept.
http://www.windowsazure.com/en-us/documentation/articles/manage-availability-virtual-machines/
There are many questions people ask in comments, but there is no technical support from Microsoft is there to answer them.
As I properly understand with availability sets you can duplicate your VM with IIS application and VM with SQL, which means you have to use 4 VM(pay for 4) instead of 2. This means that whenever IIS1 virtual machine is down, website will still be online with help of IIS2 virtual machine and vice versa? Same goes for SQL1 and SQL2 virtual machines?
Am I going to the right direction? If this is the case, how do I keep the data synchronized in SQL1 and SQL2, IIS1 and IIS2 virtual machines at the same time, so website will still be up with latest data and code if one VM is down for updates?
An availability set combines two concepts from the Windows Azure PaaS world - upgrade domains and fault domains - that help to make a service more robust. When several VMs are deployed into an availability set the Windows Azure fabric controller will distribute them among several upgrade domains and fault domains.
A fault domain represents a grouping of VMs which have a single point of failure - a convenient (although not precisely accurate) way to think about it is a rack with a single top or rack router. By deploying the VMs into different fault domains the fabric controller ensures that a single failure will not take the entire service offline.
The fabric controller uses upgrade domains to control the manner in which host OS upgrades (i.e., of the underlying physical server) are performed. The fabric controller performs these upgrades one upgrade domain at a time, only moving onto the next upgrade domain when the upgrade of the preceding upgrade domain has completed. Doing this ensures that the service remains available, although with reduced capacity, during a host OS upgrade. These upgrades appear to happen every month or two, and services in which all VMs are deployed into availability sets receive no warning since they are supposedly resilient towards the upgrade. Microsoft does provide warning about upgrades to subscriptions containing VMs deployed outside availability sets.
Furthermore, there is no SLA for services which have VMs deployed outside availability sets.
As regards SQL Server, you may want to look into the use of SQL Server Availability Groups which sit on top of Windows Server Failover Cluster and use synchronous replication of the data. For IIS, you may want to look at the possibility of deploying your application into a PaaS cloud service since that provides significant advantages over deploying it into an IaaS cloud service. You can create a service topology integrating PaaS and IaaS cloud services through the use of a VNET.
Availability set is combination of these two feature
Fault Domain(you have option to select max 3 when creating new Availability Set)
Update Domains (you have option to select max 20 when creating new Availability Set)
Fault Domain is the physical(like rack, power) set lets you selected 2 fault domain in your availability set and your machine in that availability set will have value 1 and 2 so at least one can be available in case of power failure at any physical set.
Update Domain is set which will be updated by azure system update at once.
if select 4 update domains and your 2 VM have value 2,3 that means they will not be updated together for any planed maintenance
For high availability duplicate VM should not be on same Fault Domain or same Update Domain
Now You can not change availability set after creation of a VM it should be set at the time of creation

Security groups on AWS

I understand that AWS/EC2 security groups are just like a firewall. But can I ask:
How is this implemented, for you Amazon insiders? Is it software or a hardware device that's off-the-shelf?
What happens within EC2. For example, does the security group stop me from flooding a competing website's HTTP address from within the EC2 environment, by using their private IP address? Can I access their RDP connection on the private address?
Since no one has answered yet - I'll give it a go - I'm not an AWS 'insider' but we have built a cloud management platform on top of it - so we have some experience.
A security group is the same effect as a firewall, and even in some of Amazon's documentation they refer to it as a firewall - but you don't get the same level of control as you would with your own s/w or h/w device - you just get a level of security rule setting functionality.
In a previous business we did something similar for our shared services, and basically it was some hefty hardware firewalls that we admin'd but gave users the ability to set some basic rules for their VM's. I believe AWS is pretty much the same. They have the POWER and the user has LOCAL VM control.
Hopefully someone from Amazon will see this and shed more light for you!
-Ed, digitalmines.com

Resources