Understanding availability set in Windows Azure - azure

I am reading the explanation of Availability Sets on Microsoft' website but can't 100% understand the concept.
http://www.windowsazure.com/en-us/documentation/articles/manage-availability-virtual-machines/
There are many questions people ask in comments, but there is no technical support from Microsoft is there to answer them.
As I properly understand with availability sets you can duplicate your VM with IIS application and VM with SQL, which means you have to use 4 VM(pay for 4) instead of 2. This means that whenever IIS1 virtual machine is down, website will still be online with help of IIS2 virtual machine and vice versa? Same goes for SQL1 and SQL2 virtual machines?
Am I going to the right direction? If this is the case, how do I keep the data synchronized in SQL1 and SQL2, IIS1 and IIS2 virtual machines at the same time, so website will still be up with latest data and code if one VM is down for updates?

An availability set combines two concepts from the Windows Azure PaaS world - upgrade domains and fault domains - that help to make a service more robust. When several VMs are deployed into an availability set the Windows Azure fabric controller will distribute them among several upgrade domains and fault domains.
A fault domain represents a grouping of VMs which have a single point of failure - a convenient (although not precisely accurate) way to think about it is a rack with a single top or rack router. By deploying the VMs into different fault domains the fabric controller ensures that a single failure will not take the entire service offline.
The fabric controller uses upgrade domains to control the manner in which host OS upgrades (i.e., of the underlying physical server) are performed. The fabric controller performs these upgrades one upgrade domain at a time, only moving onto the next upgrade domain when the upgrade of the preceding upgrade domain has completed. Doing this ensures that the service remains available, although with reduced capacity, during a host OS upgrade. These upgrades appear to happen every month or two, and services in which all VMs are deployed into availability sets receive no warning since they are supposedly resilient towards the upgrade. Microsoft does provide warning about upgrades to subscriptions containing VMs deployed outside availability sets.
Furthermore, there is no SLA for services which have VMs deployed outside availability sets.
As regards SQL Server, you may want to look into the use of SQL Server Availability Groups which sit on top of Windows Server Failover Cluster and use synchronous replication of the data. For IIS, you may want to look at the possibility of deploying your application into a PaaS cloud service since that provides significant advantages over deploying it into an IaaS cloud service. You can create a service topology integrating PaaS and IaaS cloud services through the use of a VNET.

Availability set is combination of these two feature
Fault Domain(you have option to select max 3 when creating new Availability Set)
Update Domains (you have option to select max 20 when creating new Availability Set)
Fault Domain is the physical(like rack, power) set lets you selected 2 fault domain in your availability set and your machine in that availability set will have value 1 and 2 so at least one can be available in case of power failure at any physical set.
Update Domain is set which will be updated by azure system update at once.
if select 4 update domains and your 2 VM have value 2,3 that means they will not be updated together for any planed maintenance
For high availability duplicate VM should not be on same Fault Domain or same Update Domain
Now You can not change availability set after creation of a VM it should be set at the time of creation

Related

Clarification on how availability sets make a single VM more available

I am having difficulty understanding Azure Availability sets, specifically, what exactly i need to to do ensure my app running on my vm is utilizing Availability sets to be more available.
Lets say i am creating an application that runs on a single VM and i want to make it more resistant to hardware failure.
Option 1:
I create an Availability Set with 2 fault domains and then create a VM on this Availability set.
Is that it?
If there is a hardware failure on the rack hosting my VM, does azure now take care of ensuring the VM stays up and running?
Option 2:
i have to have two servers Vm1 & Vm2, both in the availability set but one on fault domain 1, one on fault domain 2.
i have to then set up a cluster of sorts for my application. In this case the availability set is simply allowing me to be sure that the two servers in my cluster are not on the same hardware, but the plumbing to ensure the application can take advantage of two servers and is highly available is still down to me.
Is option 1 or option 2 the correct way in which Availability Sets work in relation to fault domains?
Appreciate any clarity that can be provided.
Azure deals with hardware failure in two ways, Availability Sets and Availability Zones. AS is all about making sure that your app does not go down even if hardware failure happens within a Data center aka Zone itself. AZs are all about making sure your app does not go down even if the whole data center aka Zone is down. More details here.
Now to understand best practices around availability take a look at the best practices, specifically for VMs can be found here.
A Single VM instance is defined as follows, reference:
"Single Instance" is defined as any single Microsoft Azure Virtual Machine that either is not deployed in an Availability Set or has only one instance deployed in an Availability Set.
So one VM in or not in an availability set does not make any difference, for this you need at least two VMs and which are in an AS using FDs and UDs so Azure will take care of this by making sure that both VMs are running on separate Hardware to avoid your app going down.
One VM in an Availability set is nearly as good as a VM with no Availability set.
If you are placing two or more VMs in an AS and those are identical then you can add a load balancer to distribute traffic.
You can also use AS without a Load balancer if you are not interested in traffic distribution. One scenario can be where you want to switch to a secondary VM only when primary is unavailable.
Also, do understand it is not required to have identical VMs in an AS.
Virtual machine scale set is a good option if you are looking for a high availability solution with VMs.

Azure availability set or zone vm auto turn on

I have a VM that runs IIS and SQL server for an enterprise application used by around 100 users.
Right now I just have this VM but I would like to add some availability. It’s not critical to have zero downtime application but at least that if by some reason the server fails then I’m able to wake up a secondary instance and reroute traffic to it.
So I guess this is done by using Availabilty Sets but what I understand is that I have at least to have two VMs in the availability set and load balancer so traffic is redirected round robin to each VM. By using the above approach that means that I must have to pay for having two instances with same specs I guess.
What I would like and don’t know if this is possible is like having same above scenario where one the of the VMs is stopped so I don’t get any charge and in case of VM failure I can started maybe manually so the application works again. If this is possible how does the hard drive is available so that the other VM always have the latest data.
If it’s not possible then can I have then for the availabilty set a second VM with the lowest specs that my app can support so if the main VM fails at least critical users can still access the app (maybe performance won’t be great but app will work) and when main VM is functional again then main traffic is again redirected to main VM.
you can achieve this by having 2 vms with premium disks only and having one as a cold backup. single vm qualify for an SLA if they only use premium disks, SLA would be 99.9% afair.
with AV sets - you need to have at least 2 running vms.

Azure Availability Sets and adding VM to specific fault domains

So I have 2 classes of VM. Lets call them serverA and serverB.
Within my availability set I want to make sure each Fault domain has 1 of each VMs (ServerA and ServerB). Is it possible to have this level of fine control? From what I can tell it looks like this is achieved by adding the Servers in a specific order i.e. serverA, server A, server B server B assuming I had 2 fault domains.
Is this true? Is it the only way?
When you deploy Virtual machines in an Availability Sets, Azure takes care of placing machines in different fault domains and update domain. We do not have granular level control on deciding which VM needs to be in which Fault domain.
In a high level, if you deploy 2 VMs, then Azure will make sure those 2 VMs are part of different Fault domain and update domain.

Azure scale set or Availability Set

We have a standard 3 tier web application that need to be migrated into cloud (more of VM based lift and shift instead of cloud native at this point).
Wondering which factors should I consider to make a decision if Azure Scale Set or Azure Availability Set should be used for Web and Application tiers.
Probably answer to questions like:
Can availability set autoscale like Scale set?
Any overhead of using either option for a simple web application?
Will both need load balancer in front of them ?
Might help to take a decision.
Any suggestions please?
You can refer to the N-tier architecture on virtual machines. Each of tier consists of two or more VMs, placed in an availability set or VM scale set. The load balancer is used to distribute requests across the VMs in a tier. Each tier is also placed inside its own subnet, and add NSG rules to restrict access to each tier and route tables to individual tiers.
For your questions:
No, The main difference is that a Scale Set have Identical VMs which makes it easy to add or remove VMs from the set whereas an Availability Set does not require them to be identical. An availability set is spread across fault domains that shared a set of hardware components, which means when you have more than one VM in different fault domains in a set it reduces the chances of losing all your VMs in event of a hardware failure in the host or rack. A regional (non-zonal) scale set uses placement groups, which act as an implicit availability set with five fault domains and five update domains. Refer to this question.
It's recommended to use VM Scale Sets for autoscaling. VMSS can automatically create and integrate with the Azure load balancer or Application Gateway.
Yes, both need Azure LB in front of them.
Generally speaking, both scenarios do not offer any way to magically make this happen, so you are kinda forced to use webapps if you want minimum overhead.
yes it can, but you need to prestage vms
yeah, you need to configure vms and for vmss you need automation so that scaling can happen automatically
yes, both will need a load balancer (web apps - not).
But your app might not work with webapps, so you are kinda forced to use vms or vmsses

automatic failover if webserver is down (SRV / additional A-record / ?)

I am starting to develop a webservice that will be hosted in the cloud but needs higher availability than typical cloud SLAs provide.
Typical SLAs, e.g. Windows Azure, promise an availability of 99.9%, i.e. up to 43min downtime per month. I am looking for an order of magnitude better availability (<5min down time per month). While I can configure several load balanced database back-ends to resolve that part of the issue I see a bottleneck at the webserver. If the webserver fails, the whole service is unavailable to the customer. What are the options of reducing that risk without introducing another possible single point of failure? I see the following solutions and drawbacks to each:
SRV-record:
I duplicate the whole infrastructure (and take care that the databases are in sync) and add additional SRV records for the domain so that the user tying to access www.example.com will automatically get forwarded to example.cloud1.com or if that one is offline to example.cloud2.com. Googling around it seems that SRV records are not supported by any major browser, is that true?
second A-record:
Add an additional A-record as alternatives. Drawbacks:
a) at my hosting provider I do not see any possibility to add a second A-record but just one... is that normal?
b)if one server of two servers are down I am not sure if the user gets automatically re-directed to the other one or 50% of all users get a 404 or some other error
Any clues for a best-practice would be appreciated
Cheers,
Sebastian
The availability of the instance i.e. SLA when specified by the Cloud Provider means the "Instance's Health is server running in the context of Hypervisor or Fabric Controller". With that said, you need to take an effort and ensure the instance is not failing because of your app / OS / or pretty much anything running inside the instance. There are few things which devops tend to miss and that kind of hit back hard like for instance - forgetting to configure the OS Updates and Patches.
The fundamental axiom with the availability is the redundancy. More redundant your application / infrastructure is more availabile is your app.
I recommend your to look into the Azure Traffic Manager and then re-work on your architecture. You need not worry about the SRV record or A-Record. Just a CNAME for the traffic manager would do the trick.
The idea of traffic manager is simple, you can tell the traffic
manager to stand after the domain name ( domain name resolution of the
app ) then the traffic manager decides where to send the request on
considerations of factors like Round-Robin, Disaster Management etc.
With the combination of the Traffic Manager and multi-region infrastructure setup; you will march towards the high availability goal.
Links
Azure Traffic Manager Overview
Cloud Power: How to scale Azure Websites globally with Traffic Manager
Maybe You should configure a corosync cluster with DRBD ?
DRBD will ensure You that the data on both nodes are replicated (for example website files and db files).
Apache as web server will be available under a virtual IP to which domain is pointed. In case of one server is down corosync will move all services to second server within few seconds.

Resources