WebJobs Not Retrying Failed Queue Message - azure-web-app-service

I have the following logic in a WebJob using the new 0.3.0-beta WebJobs SDK. When my code fails processing the message, the Azure dashboard shows an Aggregate Exception (which makes sense since this is async). HOWEVER, it does not retry processing the message.
The very little documentation I've been able to find indicates that the message should be retried within 10 minutes of failure. Is this not the case with the new SDK?
public static Task ProcessMyMessageAsync(
[QueueTrigger(Config.MY_QUEUE)] string msg,
int dequeueCount,
CancellationToken cancellationToken)
{
var processor = Config.Container.GetInstance<IMessageProcessor>();
return processor.HandleJobAsync(msg, dequeueCount, cancellationToken);
}
The exception I get stems from a SQL Timeout exception (its a db query against SQL Azure in my code):
System.AggregateException: System.AggregateException: One or more errors occurred.
---> System.Data.Entity.Core.EntityCommandExecutionException: An error occurred while executing the command definition. See the inner exception for details.
---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception: The wait operation timed out

You should set MaxDequeueCount.
JobHostConfiguration jobHostConf = new JobHostConfiguration();
jobHostConf.Queues.MaxDequeueCount = 10;
var host = new JobHost(jobHostConf);
host.RunAndBlock();
That will retry 10 times, before the messages is put on a dead/bad letter queue.
You could also use a custom retry policy in the function. I suggest you look at "The Transient Fault Handling Application Block" https://msdn.microsoft.com/en-us/library/hh680934(v=pandp.50).aspx
Or you could enable retry in EF with a SqlAzureExecutionStrategy
https://msdn.microsoft.com/en-us/data/dn456835.aspx

Related

Automatic retry to CosmosDb output binding

I'm using an Azure function that sends an array of around 200 documents to a CosmosDB via the Output Binding. That function gets triggered about 1000 at the same time by queue messages.
In some cases I get the "Request rate is large" error and the function execution fails. The documentation says when this error occurs, I can retry the execution in some milliseconds, but I suspect the azure function runtime is doing that for me. I couldn't find any documentation explicitly saying that when the output binding throws that exception it will retry automatically (like with the .NET Linq library).
Can someone point me out to see if this is the case?
The Output binding uses SDK 1.13.2 which already has the retry mechanism in place.
Assuming you are using Azure Functions v1, if you are using the IAsyncCollection the Function will do an UpsertDocumentAsync for each AddAsync, if you are using a single document output, then the UpsertDocumentAsync should be happening once.
In any case, the SDK retries by default 9 times on a throttled result, after that, the exception is bubbled and you Function will error; the document should go back to the queue for retrying as per the QueueTrigger design and after a couple of iterations, it goes to the deadletter queue..
If you want more granular control of the flow, you could obtain the DocumentClient and do the UpsertDocumentAsync yourself with a try/catch, if it fails more than 9 times, you can opt to send to another Queue or retry another set of times. Something like:
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
[FunctionName("CosmosDBSample")]
public static async Task<HttpResponseMessage> Run(
[QueueTrigger("my-queue")] MyPOCOClass myMessage,
[DocumentDB("test", "test", ConnectionStringSetting = "CosmosDB"] DocumentClient client,
TraceWriter log)
{
try
{
await client.UpsertDocumentAsync(myMessage);
}
catch(DocumentClientException ex)
{
// retry / queue somewhere else?
log.Warning($"DocumentClientException {ex.Message} in document {myMessage.Id}.");
}
}

Azure Function Execution Timeout Expired

I'm using a Logic App where the workflow calls at a certain point an Azure Function using the Webhook URL (as a workaround to Azure Functions Durable).
The goal of this function is to insert/update data into an Azure SQL Database with a SQL request
"MERGE INTO...USING...WHEN NOT MATCHED...WHEN MATCHED AND...".
In the logs of the Azure Function, i could see it failed and it seems to run 4 times (maybe due to the supposed Timeout, I don't know), but I don't understand since I increased the CommandTimeout to 50minutes and I set 1Hour to the Timeout of the action "Launch Webhook" in the LogicApp :S Here's the sample of the exception logged in the Azure Function :
Exception while executing function: XmlImport_DoWork
Microsoft.Azure.WebJobs.Host.FunctionInvocationException : Exception while executing function: XmlImport_DoWork ---> System.Data.SqlClient.SqlException : Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception : The wait operation timed out
The table actually have around 250,000 lines and it seems to be good when I launch the LogicApp (and so the Azure Function) to a table which is almost empty !
Any ideas about what's going on and how to fix it ? I tried to look at the "Query Performance Insight" in Azure SQL database component but there are nothing in "Recommendations" section
The Function App where are stored my Azure Functions is using an App Service Plan.
BTW the XML file I was trying to import in DB has a size of 20M but I tried with a lighter XML (9M) but it didn't work either
Azure Durable Function: V2 and .Net Core 2.2 - Timeout expired issue RESOLOVED
The activity function 'A_ValidateAndImportData' failed: "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.". See the function execution logs for additional details.
Using DAPPER to call SQL Server stored procedure: Dapper not honoring "Connection Timeout" Property in the connection string
Solution: Use a connection timeout parameter to provide "0"(ZERO or increase timeout according to your need) to solve this problem
Example Code:
public async Task<int> ValidateAndImportData(string connectionString, int param1,
int databaseTimeOut = 0)
{
using (var connection = new SqlConnection(connectionString))
{
var param = new DynamicParameters();
param.Add("#param1", param1);
param.Add("#returnStatus", dbType: DbType.Int32, direction: ParameterDirection.Output);
await connection.ExecuteAsync("[dbo].[ValidateAndImportData]", param,
commandType: CommandType.StoredProcedure, commandTimeout: databaseTimeOut).ConfigureAwait(false);
return param.Get<int>("returnStatus");
}
}

Azure Servicebus: Transient Fault Handling

I have a queue receiver, which reads messages from the queue and process the message (do some processing and inserts some data to the azure table or retrieves the data).
What I observed was that any exception that my processing method (SendResponseAsync()) throws results in retry i.e. redelivery of the message to the default 10 times.
Can this behavior be customized i.e. I only retry for certain exception and ignore for other. Like if there is some network issue, then it makes sense to retry but if it is BadArgumentException(poisson message), then I may not want to retry.
Since retry is taken care by ServiceBus client library, can we customize this behavior ?
This is the code at the receiver end
public MessagingServer(QueueConfiguration config)
{
this.requestQueueClient = QueueClient.CreateFromConnectionString(config.ConnectionString, config.QueueName);
this.requestQueueClient.OnMessageAsync(this.DispatchReplyAsync);
}
private async Task DispatchReplyAsync(BrokeredMessage message)
{
await this.SendResponseAsync(message);
}

How to stop an Azure WebJobs queue message from being deleted from an Azure Queue?

I'm using Azure WebJobs to poll a queue and then process the message.
Part of the message processing includes a hit to 3rd party HTTP endpoint. (e.g. a Weather api or some Stock market api).
Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
How can I:
1) Say to the SDK - please don't delete this message (so it will be retried automatically at the next queue poll and when the message is visible again).
2) Set the invisibility time value, when the SDK pops a message off the queue for processing.
Thanks!
Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
The Webjobs SDK behaves like this: If your method throws an uncaught exception, the message is returned to the Queue with its dequeueCount property +1. Else, if all is well, the message is considered successfully processed and is deleted from the Queue - i.e. queue.DeleteMessage(retrievedMessage);
So don't gracefully catch the HTTP 500, throw an exception so the SDK gets the hint.
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
From https://github.com/Azure/azure-content/blob/master/articles/app-service-web/websites-dotnet-webjobs-sdk-get-started.md#contosoadswebjob---functionscs---generatethumbnail-method:
If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to a queue named {queuename}-poison. The maximum number of attempts is configurable.
If you really dislike the hardcoded 10-minute visibility timeout (the time the message stays hidden from consumers), you can change it. See this answer by #mathewc:
From https://stackoverflow.com/a/34093943/4148708:
In the latest v1.1.0 release, you can now control the visibility timeout by registering your own custom QueueProcessor instances via JobHostConfiguration.Queues.QueueProcessorFactory. This allows you to control advanced message processing behavior globally or per queue/function.
https://github.com/Azure/azure-webjobs-sdk-samples/blob/master/BasicSamples/MiscOperations/CustomQueueProcessorFactory.cs#L63
protected override async Task ReleaseMessageAsync(CloudQueueMessage message, FunctionResult result, TimeSpan visibilityTimeout, CancellationToken cancellationToken)
{
// demonstrates how visibility timeout for failed messages can be customized
// the logic here could implement exponential backoff, etc.
visibilityTimeout = TimeSpan.FromSeconds(message.DequeueCount);
await base.ReleaseMessageAsync(message, result, visibilityTimeout, cancellationToken);
}

Azure WebJob QueueTrigger Retry Policy

I would like to have my queue retry failed webjobs every 90 minutes and only for 3 attempts.
When creating the queue i use the following code
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
IRetryPolicy linearRetryPolicy = new LinearRetry(TimeSpan.FromSeconds(5400), 3);
queueClient.DefaultRequestOptions.RetryPolicy = linearRetryPolicy;
triggerformqueue = queueClient.GetQueueReference("triggerformqueue");
triggerformqueue.CreateIfNotExists();
However when simulating a failed webjob attempt the queue uses the default retry policy.
I'm i missing something.
I think you might be thinking about this backwards. Queues don't actually perform behavior. Instead what I am guessing you want to do is have a web job that is configured to pull messages from a queue and then if it fails to process the message from a queue for some reason have the web job retry 90 minutes later. In this case you just need to set the invisibility timeout to be 90 minutes (default is 30 seconds) which will ensure that if the message isn't fully processed (ie - GetMessage and DeleteMessage are both called) then the message will reappear on the queue 90 minutes later.
Take a look at this Getting Started with Queue Storage document for more information.
There is something like Azure WebJobs SDK Extensions and ErrorTriggerAttribute (it isn't yet available in nuget 1.0.0-beta1 package, but you have access to public repository)
public static void ErrorMonitor(
[ErrorTrigger("0:30:00", 10, Throttle = "1:00:00")] TraceFilter filter,
TextWriter log)
https://github.com/Azure/azure-webjobs-sdk-extensions#errortrigger
You need to use your RetryPolicy when you add an item to the queue, not on the queue itself, eg.
var queue = queueClient.GetQueueReference("myQueue");
queue.CreateIfNotExists();
options = new QueueRequestOptions { RetryPolicy = linearRetryPolicy };
await queue.AddMessageAsync(yourMessage, null, new TimeSpan(0, delayMinutes, 0), options, null);

Resources