How Failover works when Primary VM Set get restarted? - azure

Above is sample configuration for Azure Service Fabric.
I have created with Wizard and I have deployed one Asp.net core Application and that I am able to access from out side.
Now if you look at the image below Service Fabric is being access with sfclustertemp.westus2.cloudapp.azure.com. I am able to access application with
sfclustertemp.westus2.cloudapp.azure.com/api/values.
Now if I restart primary VM set it should transfer load to secondary and I have a thought that it should done automatically but it is not as Second Load Balancer has different dns name. ( If I specify different dns name then it is accessible).
I have understanding cluser has one id so it is common for both load balancer.
Is such configuration possible ?

Maybe you could use Azure Traffic Manager with health probes.
However, instead of using multiple node types for fail-over options during reboot, have a look at 'Durability tiers'. Using Silver or Gold will have the effect that reboots are performed sequentially on machine groups (grouped by fault domain), instead of all at once.
The durability tier is used to indicate to the system the privileges
that your VMs have with the underlying Azure infrastructure. In the
primary node type, this privilege allows Service Fabric to pause any
VM level infrastructure request (such as a VM reboot, VM reimage, or
VM migration) that impact the quorum requirements for the system
services and your stateful services.

There is misconception on what is a SF cluster.
On your diagram, the part you describe on the left as 'Service Fabric' does not belong there.
Service Fabric is nothing more than applications and services deployed in the cluster nodes, when you create a cluster, you define a primary node type, will be there where service fabric will deployed the services used for managing the cluster.
A node type will be formed by:
A VM Scale Set: machines with OS and SF services installed
A load balancer with dns and IP, forwarding requests to the VM Scale Set
So what you describe there, should be represented as:
NodeTypeA (Primary)
Load Balancer (cluster domain + IP)
VM Scale Set
SF management services (explorer, DNS)
Your applications
NodeTypeB
Load Balancer (other dns + IP)
VM Scale Set
Your applications
Given that:
the first concern is, if the Primary Node goes down, you will lose your cluster, because the management services won't be available to manage your service instances.
second: you shouldn't rely on node types for this kind of reliability, you should increase the reliability of your cluster adding more nodes to the node types.
third: if the concern is a data center outage, you could:
Create a custom cluster that span multiple regions
Add a reverse proxy or API gateway in front of your service to route the request wherever your service is.

Related

Does Azure Isolated App Service Plan provides dedicated (not shared) hardware?

I know that Azure Isolated App Service Plan provide several benefits like:
Network level isolation
More worker units
Better performance
but i wonder if the isolation is also on the hardware level or not ?
With Azure App Service Environment version 3 (ASEv3), you can deploy it on a dedicated host group. Host group deployments are not zone redundant for ASEv3.
Azure Dedicated Host is a service that provides physical servers - able to host one or more virtual machines - dedicated to one Azure subscription. Dedicated hosts are the same physical servers used in our data centers, provided as a resource. You can provision dedicated hosts within a region, availability zone, and fault domain. Then, you can place VMs directly into your provisioned hosts, in whatever configuration best meets your needs.
This is NOT available with ASEv2.
Reference:
https://learn.microsoft.com/en-us/azure/app-service/environment/overview
https://learn.microsoft.com/en-us/azure/virtual-machines/dedicated-hosts

Subnets in kubernetes

I'm still experimenting with migrating my service fabric application to kubernetes. In service fabric I have two node types (effectively subnets), and for each service I configure which subnet it will be deployed to. I cannot see an equivalent option in the kubernetes yml file. Is this possible in kubernetes?
First of all, they are not effectively subnets. They could all be in the same subnet. But with AKS you have node pools. similarly to Service Fabric those could be in different subnets (inside the same vnet*, afaik). Then you would use nodeSelectors to assign pods to nodes on the specific node pool.
Same principle would apply if you are creating kubernetes cluster yourself, you would need to label nodes and use nodeSelectors to target specific nodes for your deployments.
In Azure the AKS cluster can be deployed to a specific subnet. If you are looking for deployment level isolation, deploy the two node types to different namespaces in k8s cluster. That way the node types get isolation and can be reached using service name and namespace combination
I want my backend services that access my SQL database in a different subnet to the front-end. This way I can limit access to the DB to backend subnet only.
This is an older way to solve network security. A more modern way is called Zero Trust Networking, see e.g. BeyondCorp at Google or the book Zero Trust Networks.
Limit who can access what
The Kubernetes way to limit what service can access what service is by using Network Policies.
A network policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints.
NetworkPolicy resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.
This is a more software defined way to limit access, than the older subnet-based. And the rules are more declarative using Kubernetes Labels instead of hard-to-understand IP-numbers.
This should be used in combination with authentication.
For inspiration, also read We built network isolation for 1,500 services to make Monzo more secure.
In the Security team at Monzo, one of our goals is to move towards a completely zero trust platform. This means that in theory, we'd be able to run malicious code inside our platform with no risk – the code wouldn't be able to interact with anything dangerous without the security team granting special access.
Istio Service Entry
In addition, if using Istio, you can limit access to external services by using Istio Service Entry. It is possible to use custom Network Policies for the same behavior as well, e.g. Cilium.

Scaling of Azure service fabric Stateless services

Can you please give me a better understanding of how we can scale the stateless services without partitioning?
Say we have 5 nodes in a cluster and we have 5 instances of the service. On simple testing a node is behaving as sticky where all the requests I am sending are being served by only one node. In the scenario when we have high volume of requests that come in, can other instances be automatically used to serve the traffic. How do we handle such scale out situations in service fabric?
Thanks!
Usually there's no need to use partitioning for stateless SF services, so avoid that if you can:
more on SF partitioning, including why its not normally used for stateless services
If you're using the ServiceProxy API, it will maintain sticky connections to a given physical node in the cluster. If you're (say) exposing HTTP endpoints, you'll have one for each physical instance in the cluster (meaning you'll end up talking to one at a time, unless you manually cycle thru them). You can avoid this by:
Creating a new proxy instance for each call, which tends to be expensive if you do it alot (or manually cycle thru the list of instance endpoint URLs, which can be tedious and/or expensive)
Put a load balancer in front of your cluster and configure all traffic from your clients to SF nodes to be forwarded thru that. The load balancer can be configured for Round-Robin, etc. style semantics:
Azure Load Balancer
Azure Traffic Manager
Good luck!
You can query the request using the reverse proxy installed on each node. Using the https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reverseproxy
The reverse proxy then resolve the endpoint for you. If you have multiple instances of the a stateless service then it will forward your request to a random one.
If during heavy load you can increase the instance count of your service and the proxy then include the new instances automatically.
I will assume you are calling your services from outside your cluster. If yes, your problem is not specific for Service Fabric, it is Azure VMSS + LB.
Service Fabric runs on top of Virtual Machines Scale Set, these VMs are created behind a Load Balancer, when the client connects to your service, they are creating a connection through the load balancer to your service, whenever a connection is open, the load balancer assign one target VM for handling your request, and any request made from your client, while using the same connection(keep alive), will be handled by the same node, this is why your load goes to a single node.
LB won't round robin the requests because they are using the same connection, it is a limitation(feature) of the LB, to work around this problem, you should open multiple connections or use multiple clients(instances).
This is for default distribution mode(Hash-based). You have to check also the routing rules in the LB to check if the distribution mode is Hash-based(5 tuple= ip+port) or if it is IP affinity mode(ip only), otherwise multiple connections from same IP will still be linked to same node.
Source: Azure Load Balaner Distribution Mode

iis arr proxy route to a scalable azure VM farm

We have a third party product run as a windows service, expose as a web service. The goal is to dynamically provision the service instances in business peak hours.
Just to run the thought with you guys,
- I've already deployed the service on multiple vm, configured the vm in the same cloud service Availability Sets, configured azure to turn on/off vm instances based on cpu use
- I am to configure a separate vm, run iss arr there, add points to the endpoints on the vm configured above, with the hope ARR balanced the requests to the back-end vm dynamically
Will this work? What's the best practice for the IaaS scale? Any thoughts? Truly appreciate the input.
If I have understood correctly, you just need to use the built in load balancer of the cloud service. Create a load balance set for your endpoint. For example, if you want to balace the incoming traffic to port 80 in your application all you have to do is to create a LB-set for this port and configure this set to all the VMs in the Cloud Service.
The Azure Load Balancer randomly distributes a specific type of
incoming traffic across multiple virtual machines or services in a
configuration known as a load-balanced set. For example, you can
spread the load of web request traffic across multiple web servers or
web roles.
Configure a load-balanced set
Azure load balancing for virtual machines
No matter if VMs are up or down, once it turns on and if the endpoint is configured in the same LB-set, it will automatically start responding to requests once port 80 is online (IIS started and is returning STATUS 200 OK, for example). So, answering your question: yes, it will work with auto-scale or manuallying turning on/off vms.

Understanding availability set in Windows Azure

I am reading the explanation of Availability Sets on Microsoft' website but can't 100% understand the concept.
http://www.windowsazure.com/en-us/documentation/articles/manage-availability-virtual-machines/
There are many questions people ask in comments, but there is no technical support from Microsoft is there to answer them.
As I properly understand with availability sets you can duplicate your VM with IIS application and VM with SQL, which means you have to use 4 VM(pay for 4) instead of 2. This means that whenever IIS1 virtual machine is down, website will still be online with help of IIS2 virtual machine and vice versa? Same goes for SQL1 and SQL2 virtual machines?
Am I going to the right direction? If this is the case, how do I keep the data synchronized in SQL1 and SQL2, IIS1 and IIS2 virtual machines at the same time, so website will still be up with latest data and code if one VM is down for updates?
An availability set combines two concepts from the Windows Azure PaaS world - upgrade domains and fault domains - that help to make a service more robust. When several VMs are deployed into an availability set the Windows Azure fabric controller will distribute them among several upgrade domains and fault domains.
A fault domain represents a grouping of VMs which have a single point of failure - a convenient (although not precisely accurate) way to think about it is a rack with a single top or rack router. By deploying the VMs into different fault domains the fabric controller ensures that a single failure will not take the entire service offline.
The fabric controller uses upgrade domains to control the manner in which host OS upgrades (i.e., of the underlying physical server) are performed. The fabric controller performs these upgrades one upgrade domain at a time, only moving onto the next upgrade domain when the upgrade of the preceding upgrade domain has completed. Doing this ensures that the service remains available, although with reduced capacity, during a host OS upgrade. These upgrades appear to happen every month or two, and services in which all VMs are deployed into availability sets receive no warning since they are supposedly resilient towards the upgrade. Microsoft does provide warning about upgrades to subscriptions containing VMs deployed outside availability sets.
Furthermore, there is no SLA for services which have VMs deployed outside availability sets.
As regards SQL Server, you may want to look into the use of SQL Server Availability Groups which sit on top of Windows Server Failover Cluster and use synchronous replication of the data. For IIS, you may want to look at the possibility of deploying your application into a PaaS cloud service since that provides significant advantages over deploying it into an IaaS cloud service. You can create a service topology integrating PaaS and IaaS cloud services through the use of a VNET.
Availability set is combination of these two feature
Fault Domain(you have option to select max 3 when creating new Availability Set)
Update Domains (you have option to select max 20 when creating new Availability Set)
Fault Domain is the physical(like rack, power) set lets you selected 2 fault domain in your availability set and your machine in that availability set will have value 1 and 2 so at least one can be available in case of power failure at any physical set.
Update Domain is set which will be updated by azure system update at once.
if select 4 update domains and your 2 VM have value 2,3 that means they will not be updated together for any planed maintenance
For high availability duplicate VM should not be on same Fault Domain or same Update Domain
Now You can not change availability set after creation of a VM it should be set at the time of creation

Resources