How to handle non transient exception in Azure Worker role - azure

We have two azure worker roles - A and B.
A is a Quartz scheduler which runs jobs every minute.
It reads some ids from a 'Redis cache' every minutes and execute jobs for those ids.
'A' publish its output to a service bus queue which is
subscribed by Worker role 'B'.
'B' worker role reads values from
queue and execute some more operation on them.
Both worker roles has
to build cache on startup.
Now here are few issues regarding Azure component failure:
If Redis cache goes down, how can we handle that. We need to stop our execution till the time it is up again and then we need to build our cache again. 'B' worker role should stop pulling message from service bus till the time Redis comes up again.
How to handle Service bus failure in worker role 'B'?

You don't need to stop any of the worker roles.
Worker role A should be resilient to issues in Redis cache, meaning that your code should handle any exception thrown by Redis (or network exceptions), either by retry or swallow the exception.
Worker role B should constantly pull messages from the service bus. If worker role A doesn't publish data, then worker role B should handle empty results.
Stopping your service on Redis/Azure glitch will require you to handle more complicated scenarios - for example: automatically detect if Redis is up again and automatically start your service.

One potential solution would be to configure an external health service that your workers check before pulling from the service bus or cache. If the health service says the cache or service bus are down your workers simply don't attempt to process anything.

Related

Architecture recommendation - Azure Webjob

I have a webjob that subscribes to an Azure service Bus topic. The webjob automates a very important business process. For the Service bus, it is Premium SKU and have Geo-Recovery configured. My question is about the best practice to setup High Availability for my webjob (to ensure that the process runs always). I already have the App Service Plan deployed in two regions, and the webjob is installed in both the regions. However, I would like my webjob in the secondary region to run only if the primary region is down - maybe temporarily due to an outage. How can this be implemented? If I run both the webjob in parallel, that will create some serious duplication issues. Is there any architectural pattern I can refer to, or use any features within App Service or Azure to implement this?
With ServiceBus, when you can pick up a message, it is locked so shouldn't be picked up by another process unless the lock time expires or you issue a compled message back to service bus. In your case, if you are using Peek Lock, you can use it to prevent the same message being picked up by different instances. See docs
You can also make use of sessions which is available in the premium instance of ServiceBus. In this way, you can group messages to a session and each service instance handles their own session unless the other instance is not available.
Since WebJob is associated with App service , so really depends how you have configured this. You already mentioned that WebJobs are in 2 regions which mean you have app services running in 2 regions. (make sure you have multiple instance running in each region and different Availability zones).
Now it comes down what configuration you have regarding standby region. Is it Active/passive with hot Standby, Active/passive with cold Standby or is it active/Active. If your secondary region is Active where you have atleast one instance running then your webjob is actually processing the message.
I would recommend read through these patterns and understand.
Standby Regions Configuration , Multi Region Config
Regarding Service bus, When you are processing the message with Peek-Lock it means the message is not visible in the queue so no other instance would pick up. If your webjob is not able to process in time or failed to do or crash , the message become visible in the queue again and any other instance can pick it up so no two instances can pick same message.
Better Approach
I would recommend using Azure functions to process queue message .They are serverless offering with free invocations credit a month and are naturally highly available.
You can find more about here
Azure Function Svc Bus Trigger

Running WebJobs within an Azure Worker Role

I do have a AzureWorker that receives SMTP messages from TCP ports and pushes them to queues. Other threads pick up these messages from the queues and process them. Currently, process threads have their queue polling logic. Simply they check the queues and increase wait interval if the queues are empty.
I want to simplify the queue logic and make use of other Webjobs functionalities in this AzureWorker.
Is it possible to start a WebJobs thread in this AzureWorker and let that thread handle the details? Are there any limitations that I need to know?
Azure Worker Roles are a feature of Azure Cloud Services. Azure Web Jobs are a feature of Azure App Service. They are both built to provide similar ability to run background process tasks within the context of your application. Although, since they are features of different Azure services they can't be run together like you are asking in a nested fashion.
Is it possible to start a WebJobs thread in this AzureWorker and let that thread handle the details?
I agree with Chris Pietschmann, it does not enable us to start WebJobs thread directly in Azure Worker Role.
Other threads pick up these messages from the queues and process them. Currently, process threads have their queue polling logic. Simply they check the queues and increase wait interval if the queues are empty.
I want to simplify the queue logic and make use of other Webjobs functionalities in this AzureWorker.
If you’d like to complete this task by using WebJobs, you could write a program and run as a WebJobs in your Azure App Service. And WebJobs API provides a way to dynamically start/stop WebJobs via REST API, you could use it to manage your WebJobs in your Worker Role.

autoscaling with queues does not start

so my environment is set in cloud service with 2 instances of worker role which process messages from service bus queue.I have also set up autoscaling block to increase instances when an instance has more than 10 messages to handle.
here are steps i take.
I push messages to a queue about 1000
current all my messages are unprocessed as my instances are not up.
i publish the worker role with 2 instances, and when they are up,they start reading messages correctly.
then i configure autoscaling in above stated rule for queues and 10 messages per instance.
What I excpected was since the instances already have more than they can handle, azure should start spinning up new instance.but this doesnt happen untill at least 10-15 minutes after my first two instances are up.
What could be the reason behind this and any algorithm on microsoft side?

Handling Cleanup jobs using Azure storage Queues

In order to process cleanup jobs, that will run for every 8 hours, currently we have implemented as :
Create scheduled Job using Azure Scheduler that puts message in storage queue when it is triggered.
Implement client in such a way that it will poll continuously and process whenever it receives message. Sample implementation of client is :
while (!CancellationToken.Value.IsCancellationRequested)
{
var message = await client.GetMessageAsync();
if (message != null)
{
// process the message
}
}
But the problem is we are waiting indefinetly even we know that we will get messages only after 8hours and also as per documentation, every attempt to read message from queue will incur cost.
How to optimize this in such a way that listeners will be spawned up on the fly for every configurable time instead of continuous loop?
You did not mention how is your client deployed, so I'm hoping one of the below strategies can help you optimize.
If you client is deployed as Cloud Resource: You can use Azure Automation service to schedule start/stop of the cloud resource, for example: start the cloud service just before message appears in the queue and trigger shutdown once it is done.
If your client is deployed on premise: You can of course use Thread.Sleep to reduce the number of hits
Also consider Azure Service Bus which allows you to subscribe for messages/topics.

How to ensure only once instance of an Azure web-job is running at any time

I've got a continuously running WebJob on my auto-scale Azure website.
My WebJob is a simple console application with a while(true) loop that subscribes to certain messages on the Azure service bus and processes them. I don't want to process the same message twice, so when the web site is scaled an another WebJob is started I need for it to detect that another instance is already running and just sit there doing nothing until it's either killed (by scaling down again) or the other instance is killed. In the last scenario the second WebJob should detect that's the other is no longer take over.
Any takers?
You should create a queue (either using the Service Bus or storage queues) and pull the jobs off (create and manage a lease to the message) and process them from there. If that lease is managed properly, the job should only get processed once although you should make sure it's idempotent just in case as there are fringe cases where it will be processed more than once.

Resources