Azure cloud service - Does VIP swap cause OnStop() to be invoked? - azure

I have an Azure cloud service with a web and a worker role. When an Azure cloud service is stopped, the OnStop() method is invoked.
On a VIP swap, does it call the same OnStop() method on the outgoing service deployment as soon as the VIP swap is requested?
http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleentrypoint.onstop.aspx
Also, what is the order of events during the VIP swap? Presumably, the worker OnStart() method of the new deployment is run at some point, but is this before the OnStop() of the outgoing deployment? I am trying to understand whether the two worker role versions (incoming and outgoing deployment) will be running concurrently, or whether there will be a short gap in worker role service.

To the best of my knowledge, when you perform VIP swap changes happen at the router/firewall level and not at the cloud service level. Your cloud service keeps on running and OnStop() event is not fired. As mentioned in the documentation here: http://msdn.microsoft.com/en-us/library/windowsazure/hh386336.aspx
When the service is promoted to production, the VIP and URL that were
assigned to the production environment are assigned to the deployment
that is currently in the staging environment, thus “promoting” the
service to production. The VIP and URL assigned to the staging
environment are assigned to the deployment that was in the production
environment.

No events are fired during the "VIP swap". We've added tracing for just all kind of events and entry points and definitely from the role code it looks like nothing happens during the "VIP swap".
You service will not be interrupted during the swap. Just new requests will start coming to the new deployment. The older and the newer deployments run in parallel until you stop the older (now staging) one. Your application should be able to handle this scenario and not break anything.

Related

Architecture recommendation - Azure Webjob

I have a webjob that subscribes to an Azure service Bus topic. The webjob automates a very important business process. For the Service bus, it is Premium SKU and have Geo-Recovery configured. My question is about the best practice to setup High Availability for my webjob (to ensure that the process runs always). I already have the App Service Plan deployed in two regions, and the webjob is installed in both the regions. However, I would like my webjob in the secondary region to run only if the primary region is down - maybe temporarily due to an outage. How can this be implemented? If I run both the webjob in parallel, that will create some serious duplication issues. Is there any architectural pattern I can refer to, or use any features within App Service or Azure to implement this?
With ServiceBus, when you can pick up a message, it is locked so shouldn't be picked up by another process unless the lock time expires or you issue a compled message back to service bus. In your case, if you are using Peek Lock, you can use it to prevent the same message being picked up by different instances. See docs
You can also make use of sessions which is available in the premium instance of ServiceBus. In this way, you can group messages to a session and each service instance handles their own session unless the other instance is not available.
Since WebJob is associated with App service , so really depends how you have configured this. You already mentioned that WebJobs are in 2 regions which mean you have app services running in 2 regions. (make sure you have multiple instance running in each region and different Availability zones).
Now it comes down what configuration you have regarding standby region. Is it Active/passive with hot Standby, Active/passive with cold Standby or is it active/Active. If your secondary region is Active where you have atleast one instance running then your webjob is actually processing the message.
I would recommend read through these patterns and understand.
Standby Regions Configuration , Multi Region Config
Regarding Service bus, When you are processing the message with Peek-Lock it means the message is not visible in the queue so no other instance would pick up. If your webjob is not able to process in time or failed to do or crash , the message become visible in the queue again and any other instance can pick it up so no two instances can pick same message.
Better Approach
I would recommend using Azure functions to process queue message .They are serverless offering with free invocations credit a month and are naturally highly available.
You can find more about here
Azure Function Svc Bus Trigger

Does Azure Cloud Service Load Balancer take Role Status into account?

Here's my scenario:
I have an Azure Cloud Service that runs a "hefty" .NET WCF project. The heftiness comes in with the startup tasks, as we cache a large amount of data into memory to make the project run quickly.
We're have some logic to override the OnStart method of the RoleInstance to perform this caching, so the instance doesn't return as "Ready" until all of this caching is completed.
When we deploy our service, we have 2 instances (so theyre on separate Fault\Update domains).
To that scenario I have 2 questions:
When we deploy an update or Microsoft performs maintenance against one of these managed VM's, does the Azure Load Balancer take the role state into account and not route traffic to it until it's in a "Ready" state?
For the aforementioned Load Balancer, do I have to configure anything for the cloud service to balance between the multiple instances? I was always under the impression that Microsoft managed that for you.. this way if you scale out to N role instances, the cloud service will take into account the number if instances and assign load accordingly.
Thanks!
It is handled for you. The load balancer probe communicates with the guest agent on each VM which only returns an HTTP 200 once the role is in the Ready state. However, if you’re using a web role and running w3wp.exe on it, the load balancer is not able to detect any failures like HTTP 500 responses that it may generate.
In that case, you’d need to insert an appropriate LoadBalancerProbe section in your .csdef file and also properly handle the OnStop event. This article describes the default load balancer behaviour in more detail, as well as how to customise it.

Which pieces do or do not persist in an Azure Cloud Service Web Role?

My understanding of the VMs involved in Azure Cloud Services is that at least some parts of it are not meant to persist throughout the lifetime of the service (unlike regular VMs that you can create through Azure).
This is why you must use Startup Tasks in your ServiceDefinition.csdef file in order to configure certain things.
However, after playing around with it for a while, I can't figure out what does and does not persist.
For instance, I installed an ISAPI filter into IIS by logging into remote desktop. That seems to have persisted across deployments and even a reimaging.
Is there a list somewhere of what does and does not persist and when that persistence will end (what triggers the clearing of it)?
See http://blogs.msdn.com/b/kwill/archive/2012/10/05/windows-azure-disk-partition-preservation.aspx for information about what is preserved on an Azure PaaS VM in different scenarios.
In short, the only things that will truly persist are things packaged in your cscfg/cspkg (ie. startup tasks). Anything else done at runtime or via RDP will eventually be removed.
See - How to: Update a cloud service role or deployment - in most cases, an UPDATE to an existing deployment will preserve local data while updating the application code for your cloud service.
Be aware that if you change the size of a role (that is, the size of a virtual machine that hosts a role instance) or the number of roles, each role instance (virtual machine) must be re-imaged, and any local data will be lost.
Also if you use the standard deployment practice of creating a new deployment in the staging slot and then swapping the VIP, you will also lose all local data (these are new VMs).

When does an azure worker role report ready?

I'm working on the deployment processes for a web application which runs inside an Azure cloud service.
I deploy to the staging slot, once all the instances report a status of RoleReady I then do a VIP swap into the production slot. The aim is that I can deploy a new version and my users won't have to wait while the site warms up.
I have added a certain amount of warmup into the RoleEntryPoint.OnStart, essentially this hits a number of the application's endpoints to allow the caches to spin up and and view compilation to run. What I'm seeing is that the instances all report ready, before this process has completed.
How can I tell if my application has warmed up before I swap staging into production? The deploy script I'm using is a derivative of https://gist.github.com/chartek/5265057.
The role instance does not report Ready until the OnStart method finishes and the Run method begins. You can validate this by looking at the guest agent logs on the VM itself (see http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx for more info about those logs).
When you access the endpoints are you waiting for response or just sending a request? See Azure Autoscale Restarts Running Instances for code which hits the endpoints and waits in OnStart for responses before moving the instance to the Ready state.

Azure - RoleEnvironement events not firing when role is rebooted or taken down for patching

In short, is there a RoleEnvironment event that I can handle in code when any other role in my deployment is rebooted or taken offline for patching?
I've got an application in production that has both web roles for an web front end and web roles running WCF services as an application layer (business logic, data access etc). The web layer communicates with the WCF layer over an internal endpoint as we don't want to expose the services at this point in time. So this means it is not possible to use the load balancer to call my service layer through a single url.
So I have to load balance requests to the WCF web roles manually. This has caused problems in the past when a machine has been recycled by the fabric controller for patching.
I'm handling the RoleEnvironment.Changing and RoleEnvironment.Changed events to adjust the list of backend web roles I am communicating with, which works well in testing when I make a configuration change to increase or decrease the number of instances in my deployment. But if I reboot a role through the portal, this does not fire the RoleEnvironment events.
Thanks,
Rob
RoleEnvironment.Changing will be fired "before a change to the service configuration" (my emphasis). In this case no configuration change is occurring, your service is still configured to have exactly the same number of instances. AFAIK there is no way to know when your deployment is taken offline, and clearly their are instances where notice cannot be given in advance (e.g. hardware failure). Therefore you have to code for communication failure, intercept the error, and try another role instance.
I do not believe you can intercept RoleEnvironment changes from a different Role easily.
I would suggest that you have RoleEnvironment changes trapped in the Role where they occur, handle them by throwing a message/record onto some persisted storage and let your Web-roles check that storage either on a regular schedule or every-time when you communicate to the WCF-roles.
Basically, if you're doing your own internal load-balancing, you need a mechanism for registration/tear-down of your instances so that you can manage your wcf workers
You can use the Azure storage queues to post a message when a role is going down and have a worker role that listens on that queue and adjusts things accordingly.

Resources