Scaling of Azure service fabric Stateless services - azure

Can you please give me a better understanding of how we can scale the stateless services without partitioning?
Say we have 5 nodes in a cluster and we have 5 instances of the service. On simple testing a node is behaving as sticky where all the requests I am sending are being served by only one node. In the scenario when we have high volume of requests that come in, can other instances be automatically used to serve the traffic. How do we handle such scale out situations in service fabric?
Thanks!

Usually there's no need to use partitioning for stateless SF services, so avoid that if you can:
more on SF partitioning, including why its not normally used for stateless services
If you're using the ServiceProxy API, it will maintain sticky connections to a given physical node in the cluster. If you're (say) exposing HTTP endpoints, you'll have one for each physical instance in the cluster (meaning you'll end up talking to one at a time, unless you manually cycle thru them). You can avoid this by:
Creating a new proxy instance for each call, which tends to be expensive if you do it alot (or manually cycle thru the list of instance endpoint URLs, which can be tedious and/or expensive)
Put a load balancer in front of your cluster and configure all traffic from your clients to SF nodes to be forwarded thru that. The load balancer can be configured for Round-Robin, etc. style semantics:
Azure Load Balancer
Azure Traffic Manager
Good luck!

You can query the request using the reverse proxy installed on each node. Using the https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reverseproxy
The reverse proxy then resolve the endpoint for you. If you have multiple instances of the a stateless service then it will forward your request to a random one.
If during heavy load you can increase the instance count of your service and the proxy then include the new instances automatically.

I will assume you are calling your services from outside your cluster. If yes, your problem is not specific for Service Fabric, it is Azure VMSS + LB.
Service Fabric runs on top of Virtual Machines Scale Set, these VMs are created behind a Load Balancer, when the client connects to your service, they are creating a connection through the load balancer to your service, whenever a connection is open, the load balancer assign one target VM for handling your request, and any request made from your client, while using the same connection(keep alive), will be handled by the same node, this is why your load goes to a single node.
LB won't round robin the requests because they are using the same connection, it is a limitation(feature) of the LB, to work around this problem, you should open multiple connections or use multiple clients(instances).
This is for default distribution mode(Hash-based). You have to check also the routing rules in the LB to check if the distribution mode is Hash-based(5 tuple= ip+port) or if it is IP affinity mode(ip only), otherwise multiple connections from same IP will still be linked to same node.
Source: Azure Load Balaner Distribution Mode

Related

Does Azure (Standard) load-balancing require two or more nodes in backend pool?

I'm configuring/testing Azure (Standard) load balancer, currently with a backend pool that has a single VM; in the future, additional VMs will be added.
With only a single VM in the BP, I assumed my app can still be configured to use the LB. However, I'm finding that the app is not able to connect to the VM in the BP e.g. winhttp timeout (12002).
The only reason I can think of as to why the LB is not sending traffic to the VM is because maybe there is an unwritten requirement that a backend pool is required to have at least two VMs/nodes. I cannot find documentation that confirms or denies.
Of course I can just test myself by adding a second VM to the BP, but not quite ready to do that yet. So thought I'd ask
FYI - the LB has two backend pools: #1 has two VMs for that component of the app, #2 has one VM for that component of the app.
#1 works fine; the LB is spreading the load across both VMs.
#2 does not work
Just really wanting to know if Azure LB can work when the backend pool has a single node, or are two or more nodes required.
Any thoughts/details on this topic?
Just really wanting to know if Azure LB can work when the backend pool
has a single node, or are two or more nodes required.
As far as I know, you can target a single VM to the backend pool. There are SKU comparison.
For example, I have a single VM that host a default website with port 8080, then I can configure it like this,
Backend pool setting
Health probes
Load balancer rules
Access the backend website via load balancer public IP address
For the error message, you may check if your configuration is well and read troubleshoot Azure Load Balancer for more details.

How are Azure ASE front ends themselves load balanced?

I'm researching Azure App Services and App Service Environments. I can see that the "front end" acts as a load balancer for the workers. I can also see that there is a default number of 2 front ends, with more being added as the number of workers increase.
My question is, if the front ends act as a load balancer for the workers, what is deciding which of the multiple front ends serves a request? I'd always assumed a load balancer would need to be single instance or you'd end up with the same problem that was set out to solve.
As a follow up question, I'm also curious how the load is balanced to the workers? Is it simple round robin?
The front end is a layer seven-load balancer, acting as a proxy, distributing incoming HTTP requests between different applications and their respective Workers. Currently, the App Service load-balancing algorithm is a simple round robin between a set of servers allocated for a given application.
refer: https://msdn.microsoft.com/en-us/magazine/mt793270.aspx?f=255&MSPPError=-2147217396

How Failover works when Primary VM Set get restarted?

Above is sample configuration for Azure Service Fabric.
I have created with Wizard and I have deployed one Asp.net core Application and that I am able to access from out side.
Now if you look at the image below Service Fabric is being access with sfclustertemp.westus2.cloudapp.azure.com. I am able to access application with
sfclustertemp.westus2.cloudapp.azure.com/api/values.
Now if I restart primary VM set it should transfer load to secondary and I have a thought that it should done automatically but it is not as Second Load Balancer has different dns name. ( If I specify different dns name then it is accessible).
I have understanding cluser has one id so it is common for both load balancer.
Is such configuration possible ?
Maybe you could use Azure Traffic Manager with health probes.
However, instead of using multiple node types for fail-over options during reboot, have a look at 'Durability tiers'. Using Silver or Gold will have the effect that reboots are performed sequentially on machine groups (grouped by fault domain), instead of all at once.
The durability tier is used to indicate to the system the privileges
that your VMs have with the underlying Azure infrastructure. In the
primary node type, this privilege allows Service Fabric to pause any
VM level infrastructure request (such as a VM reboot, VM reimage, or
VM migration) that impact the quorum requirements for the system
services and your stateful services.
There is misconception on what is a SF cluster.
On your diagram, the part you describe on the left as 'Service Fabric' does not belong there.
Service Fabric is nothing more than applications and services deployed in the cluster nodes, when you create a cluster, you define a primary node type, will be there where service fabric will deployed the services used for managing the cluster.
A node type will be formed by:
A VM Scale Set: machines with OS and SF services installed
A load balancer with dns and IP, forwarding requests to the VM Scale Set
So what you describe there, should be represented as:
NodeTypeA (Primary)
Load Balancer (cluster domain + IP)
VM Scale Set
SF management services (explorer, DNS)
Your applications
NodeTypeB
Load Balancer (other dns + IP)
VM Scale Set
Your applications
Given that:
the first concern is, if the Primary Node goes down, you will lose your cluster, because the management services won't be available to manage your service instances.
second: you shouldn't rely on node types for this kind of reliability, you should increase the reliability of your cluster adding more nodes to the node types.
third: if the concern is a data center outage, you could:
Create a custom cluster that span multiple regions
Add a reverse proxy or API gateway in front of your service to route the request wherever your service is.

What is the Azure Resource Manager equivalent of VIP Swap?

Azure classic Cloud Services come with a built-in load balancer that allows a fast VIP swap from production to staging, and vice versa. What equivalent is provided by Azure Resource Manager? I can use DNS, but then I have the TTL delay.
I want the fast swap because my back-end servers are stateful and cannot process the same data in both staging and production without overwriting each other. In my current system, out-of-date connections (e.g. because of HTTP keep-alive) are rejected and a reload is forced, forcing fresh connections.
I guess I might be able to do it using Azure Application Gateway, but it is not listed as one of its features.
You can do VIP swap in ARM with 2 Azure load balancers by disassociating the public IPs, then reassigning them. It's not a fast deployment slot swap like you can do with cloud services however, as can take a minute to disassociate both IP addresses (you could speed this up by doing it in parallel), and based on your question you've already looked at this approach, but documenting it here as an option. There are some notes on this approach here: https://msftstack.wordpress.com/2017/02/24/vip-swap-blue-green-deployment-in-azure-resource-manager/
In Azure resource manager, there are three ways, Azure Load Balancer(layer 4), Application Gateway(layer 7) and Traffic Manager(DNS level). I think you can use Load Balancer in you scenario.
The following table helps understanding the difference between Load Balancer and Application Gateway:

How do you set up Azure load balancing for micro-services?

We've got an API micro-services infrastructure hosted on Azure VMs. Each VM will host several APIs which are separate sites running on Kestrel. All external traffic comes in through an RP (running on IIS).
We have some API's that are designed to accept external requests and some that are internal APIs only.
The internal APIs are hosted on scalesets with each scaleset VM being a replica that hosts all of the internal APIs. There is an internal load balancer(ILB)/vip in front of the scaleset. The root issue is that we have internal APIs that call other internal APIs that are hosted on the same scaleset. Ideally these calls would go to the VIP (using internal DNS) and the VIP would route to one of the machines in the scaleset. But it looks like Azure doesn't allow this...per the documentation:
You cannot access the ILB VIP from the same Virtual Machines that are being load-balanced
So how do people set this up with micro-services? I can see three ways, none of which are ideal:
Separate out the APIs to different scalesets. Not ideal as the
services are very lightweight and I don't want to triple my Azure VM
expenses.
Convert the internal LB to an external LB (add a public
IP address). Then put that LB in it's own network security
group/subnet to only allow calls from our Azure IP range. I would
expect more latency here and exposing the endpoints externally in
any way creates more attack surface area as well as more
configuration complexity.
Set up the VM to loopback if it needs a call to the ILB...meaning any requests originating from a VM will be
handled by the same VM. This defeats the purpose of micro-services
behind a VIP. An internal micro-service may be down on the same
machine for some reason and available on another...thats' the reason
we set up health probes on the ILB for each service separately. If
it just goes back to the same machine, you lose resiliency.
Any pointers on how others have approached this would be appreciated.
Thanks!
I think your problem is related to service discovery.
Load balancers are not designed for that obviously. You should consider dedicated softwares such as Eureka (which can work outside of AWS).
Service discovery makes your microservices call directly each others after being discovered.
Also take a look at client-side load balancing tools such as Ribbon.
#Cdelmas answer is awesome on Service Discovery. Please allow me to add my thoughts:
For services such as yours, you can also look into Netflix's ZUUL proxy for Server and Client side load balancing. You could even Use Histrix on top of Eureka for latency and Fault tolerance. Netflix is way ahead of the game on this.
You may also look into Consul.io product for your cause if you want to use GO language. It has a scriptable configuration for better managing your services, allows advanced security configurations and usage of non-rest endpoints. Eureka also does these but requires you add a configuration Server (Netflix Archaius, Apache Zookeeper, Spring Cloud Config), coded security and support accesses using ZUUL/Sidecar.

Resources