guarantee azure worker role SLA - azure

I have worker role on azure, but want to make sure that it can now follow microsoft SLA of 99.95% availability.
My assumption, If I go to portal and increase the instance count to 2, it would be sufficiant.
But this wording on
http://azure.microsoft.com/en-us/support/legal/sla/
For Cloud Services, we guarantee that when you deploy two or more role instances in different fault and upgrade domains, your Internet facing roles will have external connectivity at least 99.95% of the time.
What exactly does different fault and upgrade domain signify here?
and do I need to perform any additional steps to guarantee SLA

See here for a good explanation of Azure Fault Domain and Upgrade Domains. When you deploy your worker role to two instances they'll automatically be allocated to different update and fault domains so your cloud service will be supported by the SLA. There's nothing extra you need to do.

Related

Azure App Service and infrastructure maintenance

As I understand there is no concept of update domain in App Services (and in other PaaS offerings). I am wondering how Azure is handling OS updates if I have only a single instance of an App Service app. Do I need to plan for two and more instances if I want to avoid such cases when an app goes down during the OS/other updates or this is handled without downtime? According to docs App Service has 99.95% SLA - is this time reserved here?
First of all, welcome to the community.
Your application will not become unavailable when App Services is patching the OS, you don't have to worry about that. Imagine if that would be the case, it would be a huge problem. Instead, the PaaS service will make sure your application is replicated to an updated worker node before that happens.
But you should have multiple instances, as a best practice listed in this article:
To avoid a single point-of-failure, run your app with at least 2-3 instances.
Running more than one instance ensures that your application is available when App Service moves or upgrades the underlying VM instances
Have a look at this detailed blog post:
https://azure.github.io/AppService/2018/01/18/Demystifying-the-magic-behind-App-Service-OS-updates.html
When the update reaches a specific region, we update available instances without apps on them, then move the apps to the updated instances, then update the offloaded instances.
The SLA is the same regardless the number of instances, even if you select "1 instance":
We guarantee that Apps running in a customer subscription will be available 99.95% of the time
Have a look at Hyper-V and VMWare, it will give you a rough idea on how App Services handle that.
If you're looking for zero-downtime deployments with App Services, what you are looking for are deployment slots.
Managing versions can be confusing, take a look at this issue I opened, it gives you a detailed how-to approach about managing different slot versions, which is not clearly described by Microsoft docs.

Azure - Linux Standard B2ms - Turned off automatically?

I have a Linux Standard B2ms azure virtual machine. I have disabled the autoshutdown feature you see in your dashboard under operations. For some reason this server was still shutdown after running about 8 days.
What reasons are there which could shutdown this server if I haven't changed anything on it the last three days?
What reasons are there which could shutdown this server if I haven't
changed anything on it the last three days?
There are many reasons will shutdown this VM, maybe we should try to find some logs about this.
First, we should check Azure Alerts via Azure portal, try to find some logs about you VM.
Second, we should check this VM's performance, maybe high CPU usage or high memory usage, we can find logs in /var/log/*.
Also we can try to find are there some issue about Azure service, we can check service Health -> Health history to find are there some issues in your region.
By the way, if we just create one VM in Azure, we can't avoid a single point of failure. In Azure, Microsoft recommended that two or more VMs are created within an availability set to provide for a highly available application and to meet the 99.95% Azure SLA.
An availability set is composed of two additional groupings that protect against hardware failures and allow updates to safely be applied - fault domains (FDs) and update domains (UDs).
Fault domains:
A fault domain is a logical group of underlying hardware that share a common power source and network switch, similar to a rack within an on-premises datacenter. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across these fault domains. This approach limits the impact of potential physical hardware failures, network outages, or power interruptions.
Update domains:
An update domain is a logical group of underlying hardware that can undergo maintenance or be rebooted at the same time. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across these update domains. This approach ensures that at least one instance of your application always remains running as the Azure platform undergoes periodic maintenance. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time.
In your scenario, maybe there are some unplanned maintenance events,when Microsoft update the VM host, they will migrate your VM to another host, they will shutdown your VM then migrate it.
To achieve a highly available, maybe we should create at least two VMs in one availability set.

Azure Availability Set vs Affinity Group

I'm a little bit confused about when to use Azure Availability Set and when to use Azure Affinity Group.
Lets look at the key purpose of Availability set and Affinity Group briefly to begin with.
Availability Set: is predominately to provide High Availability for your deployment. Azure does this via Fault domains and Upgrade domains.
A fault domain: is basically a different hardware rack in the same datacenter. The solution will be deployed in two different hardware racks.
Upgrade domains: is exactly same like fault domains in function, but they support upgrades rather than failures. The Upgrade domain is a logical unit of instance separation that determines which instances in a particular service will be upgraded at a point in time.
Affinity Group: In order to explain it, we need to take peek into Azure DC . Windows Azure Data Centers are purpose build , you might see rows and rows of containers (something like shipping containers) that contain clusters and racks. Each of those Containers have specific services, for example, Compute and Storage, SQL Azure, Service Bus, Access Control Service, and so on. Those containers are spread across the data center.
When you deploy a service using Portal or PowerShell , the service will talk directly to RDFE (Red Dog Front End). The RDFE controls the DC and nodes. The Cluster of nodes is controlled by Fabric Controller.. When you specify Affinity Group , the Fabric controller will place all the required elements of a deployment together. This has number of advantages like reducing latency (since required elements are close together) , Networking.
There are new changes brought in related to Network Affinity group , you can refer them (https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-migrate-to-regional-vnet/).
To address you question
You would use Availability set when you want to have Highly Available system and also want to have SLA for Compute. Without Availability set there wont be SLA for your VM or PaaS Instances in other words will single instances of VM (IaaS) and PaaS wont have SLA and prone to downtime during HW failure and Upgrades of OS.
Availability set can be implemented after the deployment as well. Do note there is cost associated with the Availability set , since you are running additional instances , so they will be charged.
Affinity group you need to include them at the time of Creation of the services . It cannot be updated after the creation. So it very important to include Affinity group at the time of creation. There is no additional charges for including Affinity group.
Do share your feedback if the response addresses your question.

Azure maintenance: possible downtime with 8 instances?

Microsoft just sent out an email notifying our company that there will be scheduled maintenance for our Windows Azure environment.
We will be performing maintenance on our networking hardware. We are
scheduling the update to occur during nonbusiness hours as much as
possible, in each maintenance region. Single and multi-instance
Virtual Machines and Cloud Services deployments will reboot once
during this maintenance operation. Each instance reboot should last 30
to 45 minutes.
We suggest using availability sets in the architecture to protect
against downtime caused by planned maintenance. This maintenance will
proceed by updating instances in only one Fault Domain (FD) at a time
for the Cloud Services and Virtual Machines in an Availability Set.
Now our website consists of a Cloud Service with 8 (small) instances of a web role. With these 8 instances, is there still a possibilty of downtime for the website? Do we need to use 'Availability Sets' or are we safe? Thanks for any info..
Depends on which service you're referring to. From my understanding, because you mentioned "Web Role", you're talking about Cloud Services (PaaS).
In General:
If you have Cloud Services (PaaS), which is what you have based on my understanding, then you won't have any downtime, no.
If you have VMs (Virtual Machines) that don't belong to the same Availability Set, then there is a chance of downtime. To fix that, just make sure they are on the same Availability Set. If you don't have VMs, ignore this.
Hope it helps.

Azure Cloud Service Billing Use Case

I was hoping I'd be able to find Azure billing 'use cases' somewhere on the MS site or on StackOverflow.
Maybe I'm being paranoid but I'm trying to be certain before I tell a customer that it'll cost $XXX.00 to move his app to Azure.
I've got an MVC site running on a server in his office. It's a data-based app using SQL-Server. Data intensive but just about 20-30 users. The purpose of going to "The Cloud" is not scalability but reliability.
Lets just say I need a Cloud Service with 2 medium VMs (2 so that we have fail-over capability) and a 1GB SQL Database. Say $2 worth of Bandwidth (15 gb) would probably be enough. Geo Redundant Storage: all the stuff besides the DB is comprised of Code. Very little in the way of resources, total less than 20 megs.
So, my question: By running a Web and Worker am I using two instances? One for Web and one for Worker? If so, can I run the app in just a Web Role? I don't run a separate service. What if I did run both Web and Worker roles for the same site, would that be an extra instance (4 instances instead of 2)?
So, by running a Web and/or Worker role am I ALSO incuring a Virtual Machine instance? If not, does the scenario change if I occasionally RDP into the Web/Worker instance?
Thanks for any insight into this. Also, does anyone know of a MS site that has billing 'use cases' like this?
Based on your description, I'm not sure why you'd want a Worker role. Worker roles are ideal for handling transactions, processing, etc. but I'm not sure if you need that. For example, worker roles can process submitted orders, resize images, etc. Basically any process that you'd like to abstract from the user interface.
Since you mention that you want fail-over capability, you should probably use at least two of whatever role(s) you choose. For example you will need a Web role for your MVC web site. You'll need two instances of whatever size you choose to qualify for Microsoft's Cloud Services SLA uptime guarantee of 99.5%.
Should you decide you need a Worker role, you'd need two instances of that as well.
It's not required to use a minimum of two instances per role type, but it's certainly recommended for production apps, and is required for SLA coverage.
you get charged for each role you activate, so web and worker role will be separate. as far as combining the worker and web together, not sure progrmatically how the
Ok lets take this one by one.
By running a Web Role and a Worker role while meeting the SLA criteria of having at least 2 instances of each role you are essentially creating 4 billable instances (2 Web Role instances and 2 Worker role instances)
You can definitely run a service within a web role if that suits your purposes and save on the worker instances. In that case you'd only have 2 billable instances.
No the VM role is a completely difference role type and you are not running a VM role by running Web/Workers. You can always safely RDP into the instances irrespective of the role type (However the merit of such an act is questionable once you are in production).

Resources