Azure availability set or zone vm auto turn on - azure

I have a VM that runs IIS and SQL server for an enterprise application used by around 100 users.
Right now I just have this VM but I would like to add some availability. It’s not critical to have zero downtime application but at least that if by some reason the server fails then I’m able to wake up a secondary instance and reroute traffic to it.
So I guess this is done by using Availabilty Sets but what I understand is that I have at least to have two VMs in the availability set and load balancer so traffic is redirected round robin to each VM. By using the above approach that means that I must have to pay for having two instances with same specs I guess.
What I would like and don’t know if this is possible is like having same above scenario where one the of the VMs is stopped so I don’t get any charge and in case of VM failure I can started maybe manually so the application works again. If this is possible how does the hard drive is available so that the other VM always have the latest data.
If it’s not possible then can I have then for the availabilty set a second VM with the lowest specs that my app can support so if the main VM fails at least critical users can still access the app (maybe performance won’t be great but app will work) and when main VM is functional again then main traffic is again redirected to main VM.

you can achieve this by having 2 vms with premium disks only and having one as a cold backup. single vm qualify for an SLA if they only use premium disks, SLA would be 99.9% afair.
with AV sets - you need to have at least 2 running vms.

Related

Clarification on how availability sets make a single VM more available

I am having difficulty understanding Azure Availability sets, specifically, what exactly i need to to do ensure my app running on my vm is utilizing Availability sets to be more available.
Lets say i am creating an application that runs on a single VM and i want to make it more resistant to hardware failure.
Option 1:
I create an Availability Set with 2 fault domains and then create a VM on this Availability set.
Is that it?
If there is a hardware failure on the rack hosting my VM, does azure now take care of ensuring the VM stays up and running?
Option 2:
i have to have two servers Vm1 & Vm2, both in the availability set but one on fault domain 1, one on fault domain 2.
i have to then set up a cluster of sorts for my application. In this case the availability set is simply allowing me to be sure that the two servers in my cluster are not on the same hardware, but the plumbing to ensure the application can take advantage of two servers and is highly available is still down to me.
Is option 1 or option 2 the correct way in which Availability Sets work in relation to fault domains?
Appreciate any clarity that can be provided.
Azure deals with hardware failure in two ways, Availability Sets and Availability Zones. AS is all about making sure that your app does not go down even if hardware failure happens within a Data center aka Zone itself. AZs are all about making sure your app does not go down even if the whole data center aka Zone is down. More details here.
Now to understand best practices around availability take a look at the best practices, specifically for VMs can be found here.
A Single VM instance is defined as follows, reference:
"Single Instance" is defined as any single Microsoft Azure Virtual Machine that either is not deployed in an Availability Set or has only one instance deployed in an Availability Set.
So one VM in or not in an availability set does not make any difference, for this you need at least two VMs and which are in an AS using FDs and UDs so Azure will take care of this by making sure that both VMs are running on separate Hardware to avoid your app going down.
One VM in an Availability set is nearly as good as a VM with no Availability set.
If you are placing two or more VMs in an AS and those are identical then you can add a load balancer to distribute traffic.
You can also use AS without a Load balancer if you are not interested in traffic distribution. One scenario can be where you want to switch to a secondary VM only when primary is unavailable.
Also, do understand it is not required to have identical VMs in an AS.
Virtual machine scale set is a good option if you are looking for a high availability solution with VMs.

Load balance between two Azure Virtual Machine Scale Set (VMSS)

Our issue:
The amount of time for provisioning a new instance takes a long time because of steps like certificate, encryption, domain join, TLS and Cipher changes,...
Solution:
In our use case, we end up to have two different VMSS for the purpose of deployment, re-imaging or blue green use case. Please note in our region (Azure Gov), we don't have access to low priority VMSS or Azure VM Spot to do pre-provisioning.
It only makes sense to have two different scales set behind a cloud-native load balancer (or private Traffic Manager - which is not available yet) to route request base on the VMSS prob readiness.
Ask:
How can we have two Azure Virtual Machine Scale Set behind a load balancer
I never tried this, but I dont see why this would not work with Standard Load Balancer (not basic, basic one is limited to 1 vmss, afair), if it doesnt - it should work with Application Gateway.

Is it useless to setup Azure VM "Availability Set" without setting up "Load Balancing"?

Let's say I have VM1 and VM2, using the service WS.cloudapp.com. Let's say I have an web app that has been depployed in both VM1 and VM2 in port 80. Because I'm not yet set up load balancing, so, for the port 80, only one VM can own, let's say VM1. When VM1 is down, end users also can not connect to WS.cloudapp.com. That lead to configuration high availability set is useless, isn't it?
You are correct. If you didn't setup LB endpoints the second VM will never receive requests. The ONLY purpose of availability set is to guarantee that at least 50% of your VMs in the same set will be provisioned in different physical hardware racks to avoid planned (or unplanned) maintenance events to affect all your VMs at the same time.
Availability Set must be combined with Load Balancer to guarantee 99.95% SLA . Combining the Azure Load Balancer (or any customized failover solution) with an Availability Set will guarantee the most application resiliency. One rules where the VMs will be provisioned physically and the other rules which VMs will receive public traffic.
There's also a problem you should be aware of that i quoted below:
Avoid single instance virtual machines in Availability Sets
Avoid leaving a single instance virtual machine in an Availability Set
by itself. Virtual machines in this configuration do not qualify for a
SLA guarantee and will face downtime during Azure planned maintenance
events. Also, if you deploy a single virtual machine instance within
an Availability Set, you will receive no advanced warning or
notification of platform maintenance. In this configuration, your
single virtual machine instance can and will be rebooted with no
advanced warning when platform maintenance occurs.
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-manage-availability/
If by "high availability set" you meant "availability set" then yes, you would not be getting the benefits from placing the VMs inside the set.
Though it's also probably worth noting that even if you place VMs in a set, failover is not instant.

automatic failover if webserver is down (SRV / additional A-record / ?)

I am starting to develop a webservice that will be hosted in the cloud but needs higher availability than typical cloud SLAs provide.
Typical SLAs, e.g. Windows Azure, promise an availability of 99.9%, i.e. up to 43min downtime per month. I am looking for an order of magnitude better availability (<5min down time per month). While I can configure several load balanced database back-ends to resolve that part of the issue I see a bottleneck at the webserver. If the webserver fails, the whole service is unavailable to the customer. What are the options of reducing that risk without introducing another possible single point of failure? I see the following solutions and drawbacks to each:
SRV-record:
I duplicate the whole infrastructure (and take care that the databases are in sync) and add additional SRV records for the domain so that the user tying to access www.example.com will automatically get forwarded to example.cloud1.com or if that one is offline to example.cloud2.com. Googling around it seems that SRV records are not supported by any major browser, is that true?
second A-record:
Add an additional A-record as alternatives. Drawbacks:
a) at my hosting provider I do not see any possibility to add a second A-record but just one... is that normal?
b)if one server of two servers are down I am not sure if the user gets automatically re-directed to the other one or 50% of all users get a 404 or some other error
Any clues for a best-practice would be appreciated
Cheers,
Sebastian
The availability of the instance i.e. SLA when specified by the Cloud Provider means the "Instance's Health is server running in the context of Hypervisor or Fabric Controller". With that said, you need to take an effort and ensure the instance is not failing because of your app / OS / or pretty much anything running inside the instance. There are few things which devops tend to miss and that kind of hit back hard like for instance - forgetting to configure the OS Updates and Patches.
The fundamental axiom with the availability is the redundancy. More redundant your application / infrastructure is more availabile is your app.
I recommend your to look into the Azure Traffic Manager and then re-work on your architecture. You need not worry about the SRV record or A-Record. Just a CNAME for the traffic manager would do the trick.
The idea of traffic manager is simple, you can tell the traffic
manager to stand after the domain name ( domain name resolution of the
app ) then the traffic manager decides where to send the request on
considerations of factors like Round-Robin, Disaster Management etc.
With the combination of the Traffic Manager and multi-region infrastructure setup; you will march towards the high availability goal.
Links
Azure Traffic Manager Overview
Cloud Power: How to scale Azure Websites globally with Traffic Manager
Maybe You should configure a corosync cluster with DRBD ?
DRBD will ensure You that the data on both nodes are replicated (for example website files and db files).
Apache as web server will be available under a virtual IP to which domain is pointed. In case of one server is down corosync will move all services to second server within few seconds.

Understanding availability set in Windows Azure

I am reading the explanation of Availability Sets on Microsoft' website but can't 100% understand the concept.
http://www.windowsazure.com/en-us/documentation/articles/manage-availability-virtual-machines/
There are many questions people ask in comments, but there is no technical support from Microsoft is there to answer them.
As I properly understand with availability sets you can duplicate your VM with IIS application and VM with SQL, which means you have to use 4 VM(pay for 4) instead of 2. This means that whenever IIS1 virtual machine is down, website will still be online with help of IIS2 virtual machine and vice versa? Same goes for SQL1 and SQL2 virtual machines?
Am I going to the right direction? If this is the case, how do I keep the data synchronized in SQL1 and SQL2, IIS1 and IIS2 virtual machines at the same time, so website will still be up with latest data and code if one VM is down for updates?
An availability set combines two concepts from the Windows Azure PaaS world - upgrade domains and fault domains - that help to make a service more robust. When several VMs are deployed into an availability set the Windows Azure fabric controller will distribute them among several upgrade domains and fault domains.
A fault domain represents a grouping of VMs which have a single point of failure - a convenient (although not precisely accurate) way to think about it is a rack with a single top or rack router. By deploying the VMs into different fault domains the fabric controller ensures that a single failure will not take the entire service offline.
The fabric controller uses upgrade domains to control the manner in which host OS upgrades (i.e., of the underlying physical server) are performed. The fabric controller performs these upgrades one upgrade domain at a time, only moving onto the next upgrade domain when the upgrade of the preceding upgrade domain has completed. Doing this ensures that the service remains available, although with reduced capacity, during a host OS upgrade. These upgrades appear to happen every month or two, and services in which all VMs are deployed into availability sets receive no warning since they are supposedly resilient towards the upgrade. Microsoft does provide warning about upgrades to subscriptions containing VMs deployed outside availability sets.
Furthermore, there is no SLA for services which have VMs deployed outside availability sets.
As regards SQL Server, you may want to look into the use of SQL Server Availability Groups which sit on top of Windows Server Failover Cluster and use synchronous replication of the data. For IIS, you may want to look at the possibility of deploying your application into a PaaS cloud service since that provides significant advantages over deploying it into an IaaS cloud service. You can create a service topology integrating PaaS and IaaS cloud services through the use of a VNET.
Availability set is combination of these two feature
Fault Domain(you have option to select max 3 when creating new Availability Set)
Update Domains (you have option to select max 20 when creating new Availability Set)
Fault Domain is the physical(like rack, power) set lets you selected 2 fault domain in your availability set and your machine in that availability set will have value 1 and 2 so at least one can be available in case of power failure at any physical set.
Update Domain is set which will be updated by azure system update at once.
if select 4 update domains and your 2 VM have value 2,3 that means they will not be updated together for any planed maintenance
For high availability duplicate VM should not be on same Fault Domain or same Update Domain
Now You can not change availability set after creation of a VM it should be set at the time of creation

Resources