Azure Servicebus: Transient Fault Handling

Azure Servicebus: Transient Fault Handling - azure

I have a queue receiver, which reads messages from the queue and process the message (do some processing and inserts some data to the azure table or retrieves the data).
What I observed was that any exception that my processing method (SendResponseAsync()) throws results in retry i.e. redelivery of the message to the default 10 times.
Can this behavior be customized i.e. I only retry for certain exception and ignore for other. Like if there is some network issue, then it makes sense to retry but if it is BadArgumentException(poisson message), then I may not want to retry.
Since retry is taken care by ServiceBus client library, can we customize this behavior ?
This is the code at the receiver end
public MessagingServer(QueueConfiguration config)
{
this.requestQueueClient = QueueClient.CreateFromConnectionString(config.ConnectionString, config.QueueName);
this.requestQueueClient.OnMessageAsync(this.DispatchReplyAsync);
}
private async Task DispatchReplyAsync(BrokeredMessage message)
{
await this.SendResponseAsync(message);
}

Related

Will Azure Service Bus continue retry after the Send call is complete with a fault/exception?

The following is the code for sending batched messages and the code block following is the setting for retry policy while creating the service bus client.
While testing a scenario involving transient network fault, we see the log that the send task was faulted but the messages were seen in the service bus for the message-id that were assigned to the messages of the batch. Is it possible that the retry kept happening in the background even after entering the fault? I am of the understanding that the fault will happen only AFTER the retry procedures are completed by the SDK. Is that correct?
Secondary query - How do I log the internal retry attempts?
//Code for sending. Uses Azure.Messaging.ServiceBus
await tempSender.SendMessagesAsync(batch).ContinueWith(async (tStat) =>
{
if (tStat.IsFaulted || tStat.IsCanceled)
{
_logger.Log($"Service Bus Send Task Faulted for batch {batchId}", EventLevel.Error);
}
else
{
_logger.Log($"Sent batch of size {batch.Count}", EventLevel.LogAlways);
onCompletionCallBack(messages);
}
});
//Retry policy
protected ServiceBusRetryOptions GetRetryPolicy() => new ServiceBusRetryOptions()
{
Delay = TimeSpan.FromSeconds(0.1),
MaxDelay = TimeSpan.FromSeconds(30),
MaxRetries = 5,
Mode = ServiceBusRetryMode.Exponential,
};

Your code could be simplified to the following:
try
{
await tempSender.SendMessagesAsync(batch);
_logger.Log($"Sent batch of size {batch.Count}", EventLevel.LogAlways);
onCompletionCallBack(messages);
}
catch (Exception exception)
{
_logger.Log($"Service Bus Send Task Faulted for batch {batchId}", EventLevel.Error);
}
Service Bus SDK will not throw an exception until all retries are exhausted. There's a chance that the messages have been dispatched, but the acknowledgement hasn't been received, which will explain what you're observing.

Secondary query - How do I log the internal retry attempts?
Transient failures and retry attempts are not directly surfaced to your application. To observe them, you would need to enable SDK logging.

With the retry options in durable functions, what happens after the last attempt?

I'm using a durable function that's triggered off a queue. I'm sending messages off the queue to a service that is pretty flaky, so I set up the RetryPolicy. Even still, I'd like to be able to see the failed messages even if the max retries has been exhausted.
Do I need to manually throw those to a dead-letter queue (and if so, it's not clear to me how I know when a message has been retried any number of times), or will the function naturally throw those to some kind of dead-letter/poison queue?

When an activity fails in Durable Functions, an exception is marshalled back to the orchestration with FunctionFailedException thrown. It doesn't matter whether you used automatic retry or not - at the very end, the whole activity fails and it's up to you to handle the situation. As per documentation:
try
{
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.DestinationAccount,
Amount = transferDetails.Amount
});
}
catch (Exception)
{
// Refund the source account.
// Another try/catch could be used here based on the needs of the application.
await context.CallActivityAsync("CreditAccount",
new
{
Account = transferDetails.SourceAccount,
Amount = transferDetails.Amount
});
}
The only thing retry changes is handling the transient error(so you do not have to enable the safe route each time you have e.g. network issues).

how to exclude queue messages like azure storage explorer?

Our code is just a copy paste from some online tutorial in getting the messages from an azure storage queue.
public int? GetQueueMessageCount(CloudQueue queue, TextWriter textWriter)
{
int? messageCount;
try
{
queue.FetchAttributes();
// Retrieve the cached approximate message count.
messageCount = queue.ApproximateMessageCount;
}
catch (Exception exception)
{
LogHelper.LogInfo(logger, textWriter, $"GetQueueMessageCount failed for {queue.Name}." + exception);
throw;
}
return messageCount;
}
However, we found that randomly some messages may get stuck in the queue and our queue trigger never got fired.
public static void ProcessUnitsForCacheItem(
[QueueTrigger(QueueClient.RefreshUnitsQueue)] string projectUnitsMessage, TextWriter textWriter)
When I open my queue with storage explorer, I can see the explorer will not show any message, instead just display a status text "displaying 0 of 199 messages". So storage explorer must somehow know that these messages are not right (expired or something).
Is there some status I can retrieve to see the status of the message or anyone know how storage explorer decide to show a message or not?

Storage explorer shows info exactly what it retrieves from Storage account/emulator.
displaying 0 of 199 messages means the messages are invisible for now because they have been dequeued and being processed, it's a feature of queue message and handled by Storage service automatically once your queue trigger gets messages from a queue. See Storage queue doc.
Typically, when a consumer retrieves a message via Get Messages, that message is usually reserved for deletion until the visibilitytimeout interval expires, but this behavior is not guaranteed. After the visibilitytimeout interval expires, the message again becomes visible to other consumers.
As for the problem
get stuck in the queue and our queue trigger never got fired
If I understand correctly, your code from some tutorial is a custom queuetrigger, which may have no guarantee on the behavior. Have a look at Azure Function Queuetrigger example.

How to stop an Azure WebJobs queue message from being deleted from an Azure Queue?

I'm using Azure WebJobs to poll a queue and then process the message.
Part of the message processing includes a hit to 3rd party HTTP endpoint. (e.g. a Weather api or some Stock market api).
Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
How can I:
1) Say to the SDK - please don't delete this message (so it will be retried automatically at the next queue poll and when the message is visible again).
2) Set the invisibility time value, when the SDK pops a message off the queue for processing.
Thanks!

Now, if the hit to the api fails (network error, 500 error, whatever) I try/catch this in my code, log whatever and then ... what??
The Webjobs SDK behaves like this: If your method throws an uncaught exception, the message is returned to the Queue with its dequeueCount property +1. Else, if all is well, the message is considered successfully processed and is deleted from the Queue - i.e. queue.DeleteMessage(retrievedMessage);
So don't gracefully catch the HTTP 500, throw an exception so the SDK gets the hint.
If I continue .. then I assume the message will be deleted by the WebJobs SDK.
From https://github.com/Azure/azure-content/blob/master/articles/app-service-web/websites-dotnet-webjobs-sdk-get-started.md#contosoadswebjob---functionscs---generatethumbnail-method:
If the method fails before completing, the queue message is not deleted; after a 10-minute lease expires, the message is released to be picked up again and processed. This sequence won't be repeated indefinitely if a message always causes an exception. After 5 unsuccessful attempts to process a message, the message is moved to a queue named {queuename}-poison. The maximum number of attempts is configurable.
If you really dislike the hardcoded 10-minute visibility timeout (the time the message stays hidden from consumers), you can change it. See this answer by #mathewc:
From https://stackoverflow.com/a/34093943/4148708:
In the latest v1.1.0 release, you can now control the visibility timeout by registering your own custom QueueProcessor instances via JobHostConfiguration.Queues.QueueProcessorFactory. This allows you to control advanced message processing behavior globally or per queue/function.
https://github.com/Azure/azure-webjobs-sdk-samples/blob/master/BasicSamples/MiscOperations/CustomQueueProcessorFactory.cs#L63
protected override async Task ReleaseMessageAsync(CloudQueueMessage message, FunctionResult result, TimeSpan visibilityTimeout, CancellationToken cancellationToken)
{
// demonstrates how visibility timeout for failed messages can be customized
// the logic here could implement exponential backoff, etc.
visibilityTimeout = TimeSpan.FromSeconds(message.DequeueCount);
await base.ReleaseMessageAsync(message, result, visibilityTimeout, cancellationToken);
}

Weird behaviour with Task Parallel Library Framework and Azure Instances

I need some help solving a problem involving the Task Parallel Library with Azure instances. Below is code for my Worker Role.
Whenever I upload multiple files, a request is inserted into the queue and the worker process continously process queries Queues and gets the message. Once a message is retrieved, I do some long runnning process. I used task schedulder so that mutliple request are served by multiple task instance on multiple instances.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Please help me on this problem. My requirement is only one Azure instance of one Ccre handles one task operation not by mutliple by task.
public override void Run()
{
//Step1 : Get the message from Queue
//Step 2:
Task<string>.Factory.StartNew(() =>
{
//Message delete from Queue
PopulateBlobtoTable(uri, localStoragePath);
}
catch (Exception ex)
{
Trace.WriteLine(ex.Message);
throw;
}
finally
{
}
}
return "Finished!";
})
catch (AggregateException ae)
{
foreach (var exception in ae.InnerExceptions)
{
Trace.WriteLine(exception.Message);
}
}

I'm assuming you are using Windows Azure Storage queues, which have a default invisibility timeout of 90 seconds, when using the storage client APIs. If your message is not completely processed and explicitly deleted within that time period, it will reappear on the queue.
While you can increase this invisibility timeout to up to seven days when you add the message to the queue, you should be using operations that are idempotent, meaning it doesn't matter if the message is processed multiple times. It's your job to ensure idempotence, perhaps by recording a unique id (in table storage, SQL database, etc.) associated with each message and ignoring the message if you see it a second time and you find it's already been marked complete.
You might also look at Windows Azure Queues and Windows Azure Service Bus Queues - Compared and Constrasted. You'll note Service Bus queues have some additional constructs you can use to guarantee at-most-once (and at-least-once) delivery.

Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Are you getting the messages via "GET" semantics? If that's the case, then what's the visibility timeout you have set for your messages. When you "GET" a message, it should become invisible to other callers (read "instances" in your case) for a particular period of time which you can specify using visibility timeout period. Check out the documentation here for this: http://msdn.microsoft.com/en-us/library/windowsazure/ee758454.aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string