With the retry options in durable functions, what happens after the last attempt? - azure

I'm using a durable function that's triggered off a queue. I'm sending messages off the queue to a service that is pretty flaky, so I set up the RetryPolicy. Even still, I'd like to be able to see the failed messages even if the max retries has been exhausted.
Do I need to manually throw those to a dead-letter queue (and if so, it's not clear to me how I know when a message has been retried any number of times), or will the function naturally throw those to some kind of dead-letter/poison queue?

When an activity fails in Durable Functions, an exception is marshalled back to the orchestration with FunctionFailedException thrown. It doesn't matter whether you used automatic retry or not - at the very end, the whole activity fails and it's up to you to handle the situation. As per documentation:
try
{
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.DestinationAccount,
Amount = transferDetails.Amount
});
}
catch (Exception)
{
// Refund the source account.
// Another try/catch could be used here based on the needs of the application.
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.SourceAccount,
Amount = transferDetails.Amount
});
}
The only thing retry changes is handling the transient error(so you do not have to enable the safe route each time you have e.g. network issues).

Related

Proper way to handle 412 Precondition failure from cosmosdb in an event driven system with application implemented in nodejs

I have an event-driven system with my application implemented in nodejs using cosmosdb (azure-cosmosdb-sqlapi) to store the events. I have planning data coming via various events from kafka broker, to complete a planning documnet I need to combine data from 5 different events, I combine the events using planning id. In such a system for upsert operation we encounter 412 Precondition failure error very often, as we receive many events for a planning id.
The official Microsoft link says to retry. I am confused about which approach to take to retry
Handle the error code using a try catch and call the method handling the event from catch block for n number of times. If the retry fails after n times throw the exception back to broker, in that case the event is send again by the broker. The issue with this is I am not able to add test for the same. Secondly I need to manage all the retry logic in my code base. The advantage here is that I know that an event is failed and I can retry directly without sending the event back to broker. Below is the the snippet from planningservice.ts handlePlanningEvents method
try {
await repository.upsert(planningEntry, etag)
} catch (e: any) {
if (e.code === 412 and retries) {
this.handlePlanningEvents(event, retries-1)
} else {
throw e // throws exception back to broker
}
}
Not using try catch to handle the error in service code but propagating the error to controller where it sends a 500 error response to broker and the broker sends the event again. The issue with this case is that it's longer path as compared to using try catch where I can retry directly. But the advantage here is that I don't to worry about retry logic anymore its handled by broker, less and cleaner code.
Not sure which approach to take, also open to other suggestions.

How to avoid memory leak when using pub sub to call function?

I stuck on performance issue when using pubsub to triggers the function.
//this will call on index.ts
export function downloadService() {
// References an existing subscription
const subscription = pubsub.subscription("DOWNLOAD-sub");
// Create an event handler to handle messages
// let messageCount = 0;
const messageHandler = async (message : any) => {
console.log(`Received message ${message.id}:`);
console.log(`\tData: ${message.data}`);
console.log(`\tAttributes: ${message.attributes.type}`);
// "Ack" (acknowledge receipt of) the message
message.ack();
await exportExcel(message);//my function
// messageCount += 1;
};
// Listen for new messages until timeout is hit
subscription.on("message", messageHandler);
}
async function exportExcel(message : any) {
//get data from database
const movies = await Sales.findAll({
attributes: [
"SALES_STORE",
"SALES_CTRNO",
"SALES_TRANSNO",
"SALES_STATUS",
],
raw: true,
});
... processing to excel// 800k rows
... bucket.upload to gcs
}
The function above is working fine if I trigger ONLY one pubsub message.
However, the function will hit memory leak issue or database connection timeout issue if I trigger many pubsub message in short period of time.
The problem I found is, first processing havent finish yet but others request from pubsub will straight to call function again and process at the same time.
I have no idea how to resolve this but I was thinking implement the queue worker or google cloud task will solve the problem?
As mentioned by #chovy in the comments, there is a need to queue up the excelExport function calls since the function's execution is not keeping up with the rate of invocation. One of the modules that can be used to queue function calls is async. Please note that the async module is not officially supported by Google.
As an alternative, you can employ flow control features on the subscriber side. Data pipelines often receive sporadic spikes in published traffic which can overwhelm subscribers in an effort to catch up. The usual response to high published throughput on a subscription would be to dynamically autoscale subscriber resources to consume more messages. However, this can incur unwanted costs — for instance, you may need to use more VM’s — which can lead to additional capacity planning. Flow control features on the subscriber side can help control the unhealthy behavior of these tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. Please refer to this blog for more information on flow control features.

Azure Servicebus: Transient Fault Handling

I have a queue receiver, which reads messages from the queue and process the message (do some processing and inserts some data to the azure table or retrieves the data).
What I observed was that any exception that my processing method (SendResponseAsync()) throws results in retry i.e. redelivery of the message to the default 10 times.
Can this behavior be customized i.e. I only retry for certain exception and ignore for other. Like if there is some network issue, then it makes sense to retry but if it is BadArgumentException(poisson message), then I may not want to retry.
Since retry is taken care by ServiceBus client library, can we customize this behavior ?
This is the code at the receiver end
public MessagingServer(QueueConfiguration config)
{
this.requestQueueClient = QueueClient.CreateFromConnectionString(config.ConnectionString, config.QueueName);
this.requestQueueClient.OnMessageAsync(this.DispatchReplyAsync);
}
private async Task DispatchReplyAsync(BrokeredMessage message)
{
await this.SendResponseAsync(message);
}

Setup webjob ServiceBusTriggers or queue names at runtime (without hard-coded attributes)?

Is there any way to configure triggers without attributes? I cannot know the queue names ahead of time.
Let me explain my scenario here.. I have one service bus queue, and for various reasons (complicated duplicate-suppression business logic), the queue messages have to be processed one at a time, so I have ServiceBusConfiguration.OnMessageOptions.MaxConcurrentCalls set to 1. So processing a message holds up the whole queue until it is finished. Needless to say, this is suboptimal.
This 'one at a time' policy isn't so simple. The messages could be processed in parallel, they just have to be divided into groups (based on a field in message), say A and B. Group A can process its messages one at a time, and group B can process its own one at a time, etc. A and B are processed in parallel, all is good.
So I can create a queue for each group, A, B, C, ... etc. There are about 50 groups, so 50 queues.
I can create a queue for each, but how to make this work with the Azure Webjobs SDK? I don't want to copy-paste a method for each queue with a different ServiceBusTrigger for the SDK to discover, just to enforce one-at-a-time per queue/group, then update the code with another copy-paste whenever another group is needed. Fetching a list of queues at startup and tying to the function is preferable.
I have looked around and I don't see any way to do what I want. The ITypeLocator interface is pretty hard-set to look for attributes. I could probably abuse the INameResolver, but it seems like I'd still have to have a bunch of near-duplicate methods around. Could I somehow create what the SDK is looking for at startup/runtime?
(To be clear, I know how to use INameResolver to get queue name as at How to set Azure WebJob queue name at runtime? but though similar this isn't my problem. I want to setup triggers for multiple queues at startup for the same function to get the one-at-a-time per queue processing, without using the trigger attribute 50 times repeatedly. I figured I'd ask again since the SDK repo is fairly active and it's been a year..).
Or am I going about this all wrong? Being dumb? Missing something? Any advice on this dilemma would be welcome.
The Azure Webjob Host discovers and indexes the functions with the ServiceBusTrigger attribute when it starts. So there is no way to set up the queues to trigger at the runtime.
The simpler solution for you is to create a long time running job and implement it manually:
public class Program
{
private static void Main()
{
var host = new JobHost();
host.CallAsync(typeof(Program).GetMethod("Process"));
host.RunAndBlock();
}
[NoAutomaticTriggerAttribute]
public static async Task Process(TextWriter log, CancellationToken token)
{
var connectionString = "myconnectionstring";
// You can also get the queue name from app settings or azure table ??
var queueNames = new[] {"queueA", "queueA" };
var messagingFactory = MessagingFactory.CreateFromConnectionString(connectionString);
foreach (var queueName in queueNames)
{
var receiver = messagingFactory.CreateMessageReceiver(queueName);
receiver.OnMessage(message =>
{
try
{
// do something
....
// Complete the message
message.Complete();
}
catch (Exception ex)
{
// Log the error
log.WriteLine(ex.ToString());
// Abandon the message so that it can be retry.
message.Abandon();
}
}, new OnMessageOptions() { MaxConcurrentCalls = 1});
}
// await until the job stop or restart
await Task.Delay(Timeout.InfiniteTimeSpan, token);
}
}
Otherwise, if you don't want to deal with multiple queues, you can have a look at azure servicebus topic/subscription and create SqlFilter to send your message to the right subscription.
Another option could be to create your own trigger: The azure webjob SDK provides extensibility points to create your own trigger binding :
Binding Extensions Overview
Good Luck !
Based on my understanding, your needs seems to be building a message batch system in parallel. The #Thomas solution is good, but I think Azure Batch service with Table storage may be better and could be instead of the complex solution of ServiceBus queue + WebJobs with a trigger.
Using Azure Batch with Table storage, you can control the task creation and execute the task in parallel and at scale, even monitor these tasks, please refer to the tutorial to know how to.

Weird behaviour with Task Parallel Library Framework and Azure Instances

I need some help solving a problem involving the Task Parallel Library with Azure instances. Below is code for my Worker Role.
Whenever I upload multiple files, a request is inserted into the queue and the worker process continously process queries Queues and gets the message. Once a message is retrieved, I do some long runnning process. I used task schedulder so that mutliple request are served by multiple task instance on multiple instances.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Please help me on this problem. My requirement is only one Azure instance of one Ccre handles one task operation not by mutliple by task.
public override void Run()
{
//Step1 : Get the message from Queue
//Step 2:
Task<string>.Factory.StartNew(() =>
{
//Message delete from Queue
PopulateBlobtoTable(uri, localStoragePath);
}
catch (Exception ex)
{
Trace.WriteLine(ex.Message);
throw;
}
finally
{
}
}
return "Finished!";
})
catch (AggregateException ae)
{
foreach (var exception in ae.InnerExceptions)
{
Trace.WriteLine(exception.Message);
}
}
I'm assuming you are using Windows Azure Storage queues, which have a default invisibility timeout of 90 seconds, when using the storage client APIs. If your message is not completely processed and explicitly deleted within that time period, it will reappear on the queue.
While you can increase this invisibility timeout to up to seven days when you add the message to the queue, you should be using operations that are idempotent, meaning it doesn't matter if the message is processed multiple times. It's your job to ensure idempotence, perhaps by recording a unique id (in table storage, SQL database, etc.) associated with each message and ignoring the message if you see it a second time and you find it's already been marked complete.
You might also look at Windows Azure Queues and Windows Azure Service Bus Queues - Compared and Constrasted. You'll note Service Bus queues have some additional constructs you can use to guarantee at-most-once (and at-least-once) delivery.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Are you getting the messages via "GET" semantics? If that's the case, then what's the visibility timeout you have set for your messages. When you "GET" a message, it should become invisible to other callers (read "instances" in your case) for a particular period of time which you can specify using visibility timeout period. Check out the documentation here for this: http://msdn.microsoft.com/en-us/library/windowsazure/ee758454.aspx

Resources