Does Azure Cloud Service Load Balancer take Role Status into account? - azure

Here's my scenario:
I have an Azure Cloud Service that runs a "hefty" .NET WCF project. The heftiness comes in with the startup tasks, as we cache a large amount of data into memory to make the project run quickly.
We're have some logic to override the OnStart method of the RoleInstance to perform this caching, so the instance doesn't return as "Ready" until all of this caching is completed.
When we deploy our service, we have 2 instances (so theyre on separate Fault\Update domains).
To that scenario I have 2 questions:
When we deploy an update or Microsoft performs maintenance against one of these managed VM's, does the Azure Load Balancer take the role state into account and not route traffic to it until it's in a "Ready" state?
For the aforementioned Load Balancer, do I have to configure anything for the cloud service to balance between the multiple instances? I was always under the impression that Microsoft managed that for you.. this way if you scale out to N role instances, the cloud service will take into account the number if instances and assign load accordingly.

It is handled for you. The load balancer probe communicates with the guest agent on each VM which only returns an HTTP 200 once the role is in the Ready state. However, if you’re using a web role and running w3wp.exe on it, the load balancer is not able to detect any failures like HTTP 500 responses that it may generate.
In that case, you’d need to insert an appropriate LoadBalancerProbe section in your .csdef file and also properly handle the OnStop event. This article describes the default load balancer behaviour in more detail, as well as how to customise it.


What is the difference between Azure Availability Test and Health Test

If you go to Azure webapp, and on the left hand panel select Application Insights. Then View Application Insights Data and then click the Availability on the left hand panel, you can add new tests. Basically, here you can specify the health/ping endpoint for the site. You can also here configure some associated rules for the alerts.
Now, Azure has got a new functionality which is called Health Check on the webapp. All you have to do is enable it, and give it your health/ping endpoint. Then you can also configure rules here.
With both methods, the health endpoint is triggered by azure and if something is not right based on the alert rules you get an alert message.
But what is the difference between the two approaches?
The difference is that if your web app runs in multi instances(if you specify the scale rules), for Health check, if an instance fails to respond to the ping, the system determines it is unhealthy and removes it from the load balancer rotation. This increases your application’s average availability and resiliency.
Availability-test in Application Insights does not do such thing, it just checks the health.
You can review these docs: Health Check is now Generally Available, Does App Service Health Checks logs in Application Insights?, What App Service does with Health checks.
App Insights Data Availability is very specified for checking health and alerting via some mode, while Health check was released for a way bigger prospects with the facility of
Health check for all instances every 1 min (somewhere what availability test does)
Removes the instance if ping fails.
restarts underlying VM
replaces the instance if needed
Helps in scale out/up for new instances.
Moreover, this can be used for more stuff like reporting etc. please make sure that it's not used for premium services.

Using Azure Service Fabric to Manually Control and Spawn Job-Processing Agents

Currently I'm investigating possibility to use Azure Service Fabric and its Reliable Services in order to implement my problem domain architecture.
Problem domain: I am currently doing a research on distributed large-scale web crawling architectures involving dozens of parallel agents that should crawl web-servers and download resources for further indexing.
I've found useful academic paper which describes Azure-based distributed web-crawling architecture: Link to .pdf paper and I'm trying to implement and try out prototype based on this design.
So basic high-level look of design is something like this figure below:
The idea: Central Web Crawling System Engine (further - CWCE) runs in an infinite loop until program is aborted and fetches Service Bus Queue Message which contains URL of page to be crawled. CWCE component then checks hostname of this URL and consults Agent Registrar SQL database if alive agent already exists for given hostname. If not, CWCE then does one of the following procedures:
If number of alive agents (A_alive) is equal to Max value (upper bound limit of agents, provided by application administrator) CWCE waits until A_alive < Max value
If A_alive < Max, CWCE tries to create new Agent and assign hostname to it. (agent is then registered in SQL Registrar database).
Each Agent runs on its own partition (URL hostname, for example: and recursively crawls only pages of this hostname while discovering external hostnames URLs and adding them to Service Bus Queue for other agent processings.
The benefit of this architecture would be horizontal scaling of agents and near-linear workload increase of crawling effectiveness.
However, I am very new in Azure Service Fabric and therefore would like to ask if this PaaS layer is capable of solving this problem? Main questions:
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
I am not sure whether the selected ASF PaaS layer is the best solution for this distributed web-crawling system use-case, so your insights would be so much valuable for me. Any helpful resource links would also be so beneficial.
Service Fabric will allow you to implement the architecture that you want.
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Yes. The service you will develop and deploy to Service Fabric will be a ServiceType. Service Types don't actually run, instead, from the ServiceType you can create the actual Services, which are named. A single Service (eg ServiceA), will have a number of Instances, to allow scaling and availability. You can programmatically create and remove services of a given type and pass parameters to them, so every service will know what URL to crawl.
Check an example here.
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
I would choose Stateless services, because they will be the most efficient in terms of resource utilization and the easiest to manage (no need to store state and manage state, partitioning and replicas). The only thing you need to consider is that every service will eventually crash and restart, so you need to store the current crawling location in a permanent store, not in memory.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Yes. Service Fabric services run in Nodes (Virtual Machines) and in Azure, they are managed by Virtual Machine Scale Sets. You can easily add and remove nodes from the VMSS which Will allow you to adjust the total compute and memory power that you want and the actual number of services is already controlled by you as specified in point 1.
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
Absolutely. Message-driven microservices are very common. It's technically not an infinite loop, but a service with a Bus Communication Listener. I found one here as a reference, but I don't know if it's production ready

Azure Dynamic App Service instance that starts up and shuts down automatically based on the current needs

I am new to Microsoft Azure / Google Cloud and I am currently comparing these two different cloud solution providers, before starting a new project. I am planning to write a web application using either Google Cloud App Engine or Azure App Service.
I want to start with a very basic service instance, which I want to call via HTTPS. To reduce charges it would be nice to only pay for used service minutes resp. that the instance only runs, when needed.
Google Cloud offers dynamic instances, where compute instances are shutdown, when idle and started for incoming requests. Which seems way cheaper for a seldom used prototype and first usage of cloud services.
Instances are resident or dynamic. A dynamic instance starts up and shuts down automatically based on the current needs. [...] When an application is not being used at all, App Engine turns off its associated dynamic instances, but readily reloads them as soon as they are needed.
Unfortunately, I found in the Azure documentation only an Overview of autoscale in Microsoft Azure Virtual Machines, Cloud Services, and Web Apps, which does not cover my question of an automatic instance shutdown in idle state. Also Start/Stop VMs during off-hours solution in Azure Automation does not satisfy my information need, because I am looking only for a compute instance and not a full VM.
Is there an equivalent in the Azure domain, that allows to automatically start up and shut down app service instances, based on the usage resp. incoming requests?
Depending on the functionality of the two cloud service provider, I am deciding which one to use. Has anybody experience with this matter in the Azure domain? Thank you.
You can't do that with Azure App Service alone as of now (24-Feb-2019). But you could combine an Azure function to fire up a App Service instance and then forward all incoming traffic to an app hosted in this App Service via an Azure function proxy, see this description on I was planning to try this for while now too. In theory it should work... From experience, App Service instances fire up quickly, so the warm up time should be acceptable. Even better, you could keep free or shared App Service plan instance with your app running and forward the Azure function calls to it by default. On increasing load, move the app to a pre-configured plan which supports auto scaling.
Of course you could try to implement the entire app via a set of Azure functions which are fully "dynamic" using your terminology. Depending on the architecture of your application, this might actually be the best choice.
The Autoscale feature of Azure offers you to scale out/scale in based on configurable criterias, take a look here. You are limited by your pricing tier. Maybe this example will help you get an insight.

Understanding how apps in roles are served in Azure

The company I work for is looking to develop a few apps against the cloud.
An ASP.NET Web Api application hosted in an Azure web role.
A Windows Server type application hosted in an Azure worker role.
We are completely new to web or cloud development and would like to know the following:
When being served to the consumer, is the same instance of the application being served to all, is it one per request or are multiple roles being created and served to consumers?
When being served to the consumer, is the same instance of the application being served to all?
That depends on how many instances you've asked Azure to run your application on. If you've only deployed to 1 instance, then it will of course be the same instance that responds to all requests. But if you deploy to multiple instances, requests will be load-balanced, which means you have no guarantee that multiple requests from the same user will be handled by the same instance.
When you're asking this question, it could be because you might be tempted to store local data on the machine running the instance. However, this is not a good idea. Windows Azure can at any time tear down your instance and start your application on a completely different machine. They call this "healing", because it usually happens because Windows Azure tries to be helpful and avoid any potential problem that could mean downtime for your instance. But it also happens if your machine for some reason locks up or something else bad happens. This process of healing means that anything that's not part of your deployment package will be lost. So for example, if you're logging to a file on the disk, this log will be lost if Azure "heals" your instance.
is it one per request or are multiple roles being created and served to consumers?
I'm not completely sure what you mean here, so I'll take a guess and risk interpreting your question wrongly. My guess is that you're asking if there will be one instance per user request. No, there will only be the number of instances that you have decided. Remember that you have to pay per instance that's running, so it's only fair that the number of instances running is dictated by you.
When you have your application packaged and ready to be deployed to Windows Azure, you can decide how many instances of each role you want to have running. You set this number in the deployment package, so that when your package is deployed, Azure will automatically start the requested number of instances. However, you can change the number of running instances of each role after deployment and on-the-fly. This makes it possible for you to scale with more instances within minutes.
I hope this helps and that I understood your questions correctly. :-)
Azure Web and Worker roles in an Azure Cloud Service are deployed on at least one instance (managed VM). Azure allows you to size (memory, CPU) and scale (number of instances). Azure actually lets you scale dynamically, i.e. add more instances on demand. You pay by the hour for the size & number of instances deployed.
For example, a Cloud service can have a single instance of a worker role (background processing) and multiple instances of the Web role. Multiple instances are handled behind a load balancer and the client is unaware of what instance they are using (all instances are created equal).
An Azure role instance is a VM with some specific payload.
For example, in your service you declare you want three instances of "Frontend" web role and two instances of "Backend" worker role. When Azure deploys your service it starts five VMs and three of them will run "Frontend" payload and have IIS started and two of them will run "Backend" payload and have no IIS started.
Now until you ask Azure to change that configuration it remains persistent no matter what requests come and what load occurs. You have five VMs with 3+2 configurations. To change the configuration you need some action on your part.
There're two way to have the configuration changed. You can use Management Portal or an external program to change the "instance count" in either or both roles. You can also add auto-scaling code that will gather metrics and make Management API requests to alter the "instance count". Either way when "instance count" goes up Azure starts more VMs with the same payload and when it goes down it stops some of the VMs.

Azure - RoleEnvironement events not firing when role is rebooted or taken down for patching

In short, is there a RoleEnvironment event that I can handle in code when any other role in my deployment is rebooted or taken offline for patching?
I've got an application in production that has both web roles for an web front end and web roles running WCF services as an application layer (business logic, data access etc). The web layer communicates with the WCF layer over an internal endpoint as we don't want to expose the services at this point in time. So this means it is not possible to use the load balancer to call my service layer through a single url.
So I have to load balance requests to the WCF web roles manually. This has caused problems in the past when a machine has been recycled by the fabric controller for patching.
I'm handling the RoleEnvironment.Changing and RoleEnvironment.Changed events to adjust the list of backend web roles I am communicating with, which works well in testing when I make a configuration change to increase or decrease the number of instances in my deployment. But if I reboot a role through the portal, this does not fire the RoleEnvironment events.
RoleEnvironment.Changing will be fired "before a change to the service configuration" (my emphasis). In this case no configuration change is occurring, your service is still configured to have exactly the same number of instances. AFAIK there is no way to know when your deployment is taken offline, and clearly their are instances where notice cannot be given in advance (e.g. hardware failure). Therefore you have to code for communication failure, intercept the error, and try another role instance.
I do not believe you can intercept RoleEnvironment changes from a different Role easily.
I would suggest that you have RoleEnvironment changes trapped in the Role where they occur, handle them by throwing a message/record onto some persisted storage and let your Web-roles check that storage either on a regular schedule or every-time when you communicate to the WCF-roles.
Basically, if you're doing your own internal load-balancing, you need a mechanism for registration/tear-down of your instances so that you can manage your wcf workers
You can use the Azure storage queues to post a message when a role is going down and have a worker role that listens on that queue and adjusts things accordingly.
