WebJob QueueTrigger - limit per function - azure

I've got a few Webjobs in place, each of which respond to a number of QueueTrigger, e.g.
public static void ProcessMessage([QueueTrigger("XXXXXXX")] string message, TextWriter log)
{
//processing message
}
public static void ProcessMessage([QueueTrigger("YYYYYY")] string message, TextWriter log)
{
//processing message
}
Should I be separating out each trigger to a separate job? Are there any reasons why continuing on this path is a bad idea, i.e. the more queues it can trigger the less functions get executed due to thread limits?

What you are doing is the common approach - the WebJobs SDK JobHost is designed to handle many different job functions all within the same application. It is true that all the job functions within a single host will share the same process/memory space and limits, but for most scenarios this isn't a problem and is the recommended approach.
For QueueTrigger specifically, each of your functions will efficiently poll for new work, and when work is available each will pull messages in batches of 16 (configurable via JobHostConfiguration.Queues) and process them in parallel.
You can also scale out if needed by increasing the number of instances your WebJob runs on. Each instance will then work cooperatively with the others to handle more load.

Related

Waiting for an azure function durable orchestration to complete

Currently working on a project where I'm using the storage queue to pick up items for processing. The Storage Queue triggered function is picking up the item from the queue and starts a durable orchestration. Normally the according to the documentation the storage queue picks up 16 messages (by default) in parallel for processing (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue), but since the orchestration is just being started (simple and quick process), in case I have a lot of messages in the queue I will end up with a lot of orchestrations running at the same time. I would like to be able to start the orchestration and wait for it to complete before the next batch of messages are being picked up for processing in order to avoid overloading my systems. The solution I came up with and seems to work is:
public class QueueTrigger
{
[FunctionName(nameof(QueueTrigger))]
public async Task Run([QueueTrigger("queue-processing-test", Connection = "AzureWebJobsStorage")]Activity activity, [DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
log.LogInformation($"C# Queue trigger function processed: {activity.ActivityId}");
string instanceId = await starter.StartNewAsync<Activity>(nameof(ActivityProcessingOrchestrator), activity);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
var status = await starter.GetStatusAsync(instanceId);
do
{
status = await starter.GetStatusAsync(instanceId);
} while (status.RuntimeStatus == OrchestrationRuntimeStatus.Running || status.RuntimeStatus == OrchestrationRuntimeStatus.Pending);
}
which basically picks up the message, starts the orchestration and then in a do/while loop waits while the staus is Pending or Running.
Am I missing something here or is there any better way of doing this (I could not find much online).
Thanks in advance your comments or suggestions!
This might not work since you could either hit timeouts causing duplicate orchestration runs or just force your function app to scale out defeating the purpose of your code all together.
Instead, you could rely on the concurrency throttles that Durable Functions come with. While the queue trigger would queue up orchestrations runs, only the max defined would run at any time on a single instance of a function.
This would still cause your function app to scale out, so you would have to consider that as well when setting this limit and you could also set the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting to control how many instances you function app can scale out to.
It could be that the Function app's built in scaling throttling does not reduce load on downstream services because it is per app and will just cause the app to scale more. Then what is needed is a distributed max instance count that all app instances adhere to. I have built this functionality into my Durable Function orchestration app with a scaleGroupId and it`s max instance count. It has an Api call to save this info and the scaleGroupId is a string that can be set to anything that describes the resource you want to protect from overloading. Here is my app that can do this:
Microflow

Sequence processing with Azure Function & Service Bus

I have an issue with Azure Function Service Bus trigger.
The issue is Azure function cannot wait a message done before process a new message. It process Parallel, it not wait 5s before get next message. But i need it process sequencecy (as image bellow).
How can i do that?
[FunctionName("HttpStartSingle")]
public static void Run(
[ServiceBusTrigger("MyServiceBusQueue", Connection = "Connection")]string myQueueItem,
[OrchestrationClient] DurableOrchestrationClient starter,
ILogger log)
{
Console.WriteLine($"MessageId={myQueueItem}");
Thread.Sleep(5000);
}
I resolved my problem by using this config in my host.json
{
"version": "2.0",
"extensions": {
"serviceBus": {
"messageHandlerOptions": {
"maxConcurrentCalls": 1
}
}
}}
There are two approaches you can accomplish this,
(1) You are looking for Durable Function with function chaining
For background jobs you often need to ensure that only one instance of
a particular orchestrator runs at a time. This can be done in Durable
Functions by assigning a specific instance ID to an orchestrator when
creating it.
(2) Based on the messages that you are writing to Queue, you need to partition the data, that will automatically handle the order of messages which you do not need to handle manually by azure function
In general, ordered messaging is not something I'd be striving to implement since the order can and at some point will be distorted. Saying that, in some scenarios, it's required. For that, you should either use Durable Function to orchestrate your messages or use Service Bus message Sessions.
Azure Functions has recently added support for ordered message delivery (accent on the delivery part as processing can still fail). It's almost the same as the normal Function, with a slight change that you need to instruct the SDK to utilize sessions.
public async Task Run(
[ServiceBusTrigger("queue",
Connection = "ServiceBusConnectionString",
IsSessionsEnabled = true)] Message message, // Enable Sessions
ILogger log)
{
log.LogInformation($"C# ServiceBus queue trigger function processed message: {Encoding.UTF8.GetString(message.MessageId)}");
await _cosmosDbClient.Save(...);
}
Here's a post for more detials.
Warning: using sessions will require messages to be sent with a session ID, potentially requiring a change on the sending side.

Setup webjob ServiceBusTriggers or queue names at runtime (without hard-coded attributes)?

Is there any way to configure triggers without attributes? I cannot know the queue names ahead of time.
Let me explain my scenario here.. I have one service bus queue, and for various reasons (complicated duplicate-suppression business logic), the queue messages have to be processed one at a time, so I have ServiceBusConfiguration.OnMessageOptions.MaxConcurrentCalls set to 1. So processing a message holds up the whole queue until it is finished. Needless to say, this is suboptimal.
This 'one at a time' policy isn't so simple. The messages could be processed in parallel, they just have to be divided into groups (based on a field in message), say A and B. Group A can process its messages one at a time, and group B can process its own one at a time, etc. A and B are processed in parallel, all is good.
So I can create a queue for each group, A, B, C, ... etc. There are about 50 groups, so 50 queues.
I can create a queue for each, but how to make this work with the Azure Webjobs SDK? I don't want to copy-paste a method for each queue with a different ServiceBusTrigger for the SDK to discover, just to enforce one-at-a-time per queue/group, then update the code with another copy-paste whenever another group is needed. Fetching a list of queues at startup and tying to the function is preferable.
I have looked around and I don't see any way to do what I want. The ITypeLocator interface is pretty hard-set to look for attributes. I could probably abuse the INameResolver, but it seems like I'd still have to have a bunch of near-duplicate methods around. Could I somehow create what the SDK is looking for at startup/runtime?
(To be clear, I know how to use INameResolver to get queue name as at How to set Azure WebJob queue name at runtime? but though similar this isn't my problem. I want to setup triggers for multiple queues at startup for the same function to get the one-at-a-time per queue processing, without using the trigger attribute 50 times repeatedly. I figured I'd ask again since the SDK repo is fairly active and it's been a year..).
Or am I going about this all wrong? Being dumb? Missing something? Any advice on this dilemma would be welcome.
The Azure Webjob Host discovers and indexes the functions with the ServiceBusTrigger attribute when it starts. So there is no way to set up the queues to trigger at the runtime.
The simpler solution for you is to create a long time running job and implement it manually:
public class Program
{
private static void Main()
{
var host = new JobHost();
host.CallAsync(typeof(Program).GetMethod("Process"));
host.RunAndBlock();
}
[NoAutomaticTriggerAttribute]
public static async Task Process(TextWriter log, CancellationToken token)
{
var connectionString = "myconnectionstring";
// You can also get the queue name from app settings or azure table ??
var queueNames = new[] {"queueA", "queueA" };
var messagingFactory = MessagingFactory.CreateFromConnectionString(connectionString);
foreach (var queueName in queueNames)
{
var receiver = messagingFactory.CreateMessageReceiver(queueName);
receiver.OnMessage(message =>
{
try
{
// do something
....
// Complete the message
message.Complete();
}
catch (Exception ex)
{
// Log the error
log.WriteLine(ex.ToString());
// Abandon the message so that it can be retry.
message.Abandon();
}
}, new OnMessageOptions() { MaxConcurrentCalls = 1});
}
// await until the job stop or restart
await Task.Delay(Timeout.InfiniteTimeSpan, token);
}
}
Otherwise, if you don't want to deal with multiple queues, you can have a look at azure servicebus topic/subscription and create SqlFilter to send your message to the right subscription.
Another option could be to create your own trigger: The azure webjob SDK provides extensibility points to create your own trigger binding :
Binding Extensions Overview
Good Luck !
Based on my understanding, your needs seems to be building a message batch system in parallel. The #Thomas solution is good, but I think Azure Batch service with Table storage may be better and could be instead of the complex solution of ServiceBus queue + WebJobs with a trigger.
Using Azure Batch with Table storage, you can control the task creation and execute the task in parallel and at scale, even monitor these tasks, please refer to the tutorial to know how to.

Azure Triggered Webjob - Detecting when webjob stops

I am developing a triggered webjob that use TimerTrigger.
Before the webjob stops, I need to dispose some objects but I don't know how to trigger the "webjob stop".
Having a NoAutomaticTrigger function, I know that I can use the WebJobsShutdownWatcher class to handle when the webjob is stopping but with a triggered job I need some help...
I had a look at Extensible Triggers and Binders with Azure WebJobs SDK 1.1.0-alpha1.
Is it a good idea to create a custom trigger (StopTrigger) that used the WebJobsShutdownWatcher class to fire action before the webjob stops ?
Ok The answer was in the question :
Yes I can use the WebJobsShutdownWatcher class because it has a Register function that is called when the cancellation token is canceled, in other words when the webjob is stopping.
static void Main()
{
var cancellationToken = new WebJobsShutdownWatcher().Token;
cancellationToken.Register(() =>
{
Console.Out.WriteLine("Do whatever you want before the webjob is stopped...");
});
var host = new JobHost();
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
}
EDIT (Based on Matthew comment):
If you use Triggered functions, you can add a CancellationToken parameter to your function signatures. The runtime will cancel that token when the host is shutting down automatically, allowing your function to receive the notification.
public static void QueueFunction(
[QueueTrigger("QueueName")] string message,
TextWriter log,
CancellationToken cancellationToken)
{
...
if(cancellationToken.IsCancellationRequested) return;
...
}
I was recently trying to figure out how to do this without the
WebJobs SDK which contains the WebJobShutdownWatcher, this is what I
found out.
What the underlying runtime does (and what the WebJobsShutdownWatcher referenced above checks), is create a local file at the location specified by the environment variable %WEBJOBS_SHUTDOWN_FILE%. If this file exists, it is essentially the runtime's signal to the webjob that it must shutdown within a configurable wait period (default of 5 seconds for continuous jobs, 30 for triggered jobs), otherwise the runtime will kill the job.
The net effect is, if you are not using the Azure WebJobs SDK, which contains the WebJobsShutdownWatcher as described above, you can still achieve graceful shutdown of your Azure Web Job by monitoring for the shutdown file on an interval shorter than the configured wait period.
Additional details, including how to configure the wait period, are described here: https://github.com/projectkudu/kudu/wiki/WebJobs#graceful-shutdown

Weird behaviour with Task Parallel Library Framework and Azure Instances

I need some help solving a problem involving the Task Parallel Library with Azure instances. Below is code for my Worker Role.
Whenever I upload multiple files, a request is inserted into the queue and the worker process continously process queries Queues and gets the message. Once a message is retrieved, I do some long runnning process. I used task schedulder so that mutliple request are served by multiple task instance on multiple instances.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Please help me on this problem. My requirement is only one Azure instance of one Ccre handles one task operation not by mutliple by task.
public override void Run()
{
//Step1 : Get the message from Queue
//Step 2:
Task<string>.Factory.StartNew(() =>
{
//Message delete from Queue
PopulateBlobtoTable(uri, localStoragePath);
}
catch (Exception ex)
{
Trace.WriteLine(ex.Message);
throw;
}
finally
{
}
}
return "Finished!";
})
catch (AggregateException ae)
{
foreach (var exception in ae.InnerExceptions)
{
Trace.WriteLine(exception.Message);
}
}
I'm assuming you are using Windows Azure Storage queues, which have a default invisibility timeout of 90 seconds, when using the storage client APIs. If your message is not completely processed and explicitly deleted within that time period, it will reappear on the queue.
While you can increase this invisibility timeout to up to seven days when you add the message to the queue, you should be using operations that are idempotent, meaning it doesn't matter if the message is processed multiple times. It's your job to ensure idempotence, perhaps by recording a unique id (in table storage, SQL database, etc.) associated with each message and ignoring the message if you see it a second time and you find it's already been marked complete.
You might also look at Windows Azure Queues and Windows Azure Service Bus Queues - Compared and Constrasted. You'll note Service Bus queues have some additional constructs you can use to guarantee at-most-once (and at-least-once) delivery.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Are you getting the messages via "GET" semantics? If that's the case, then what's the visibility timeout you have set for your messages. When you "GET" a message, it should become invisible to other callers (read "instances" in your case) for a particular period of time which you can specify using visibility timeout period. Check out the documentation here for this: http://msdn.microsoft.com/en-us/library/windowsazure/ee758454.aspx

Resources