Batch insert to Table Storage via Azure function - azure

I have a following azure storage queue trigger azure function which is binded to azure table for the output.
[FunctionName("TestFunction")]
public static async Task<IActionResult> Run(
[QueueTrigger("myqueue", Connection = "connection")]string myQueueItem,
[Table("TableXyzObject"), StorageAccount("connection")] IAsyncCollector<TableXyzObject> tableXyzObjectRecords)
{
var tableAbcObject = new TableXyzObject();
try
{
tableAbcObject.PartitionKey = DateTime.UtcNow.ToString("MMddyyyy");
tableAbcObject.RowKey = Guid.NewGuid();
tableAbcObject.RandomString = myQueueItem;
await tableXyzObjectRecords.AddAsync(tableAbcObject);
}
catch (Exception ex)
{
}
return new OkObjectResult(tableAbcObject);
}
public class TableXyzObject : TableEntity
{
public string RandomString { get; set; }
}
}
}
I am looking for a way to read 15 messages from poisonqueue which is different than myqueue (queue trigger on above azure function) and batch insert it in to dynamic table (tableXyz, tableAbc etc) based on few conditions in the queue message. Since we have different poison queues, we want to pick up messages from multiple poison queues (name of the poison queue will be provided in the myqueue message). This is done to avoid to spinning up new azure function every time we have a new poison queue.
Following is the approach I have in my mind,
--> I might have to get 15 queue messages using queueClient (create new one) method - ReceiveMessages(15) of Azure.Storage.Queue package
--> And do a batch insert using TableBatchOperation class (cannot use output binding)
Is there any better approch than this?

Unfortunately, storage queues don't have a great solution for this. If you want it to be dynamic then the idea of implementing your own clients and table outputs is probably your best option. The one thing I would suggest changing is using a timer trigger instead of a queue trigger. If you are putting a message on your trigger queue every time you add something to the poison queue it would work as is, but if not a timer trigger ensures that poisoned messages are handled in a timely fashion.
Original Answer (incorrectly relating to Service Bus queues)
Bryan is correct that creating a new queue client inside your function isn't the best way to go about this. Fortunately, the Service Bus extension does allow batching. Unfortunately the docs haven't quite caught up yet.
Just make your trigger receive an array:
[QueueTrigger("myqueue", Connection = "connection")]string myQueueItem[]
You can set your max batch size in the host.json:
"extensions": {
"serviceBus": {
"batchOptions": {
"maxMessageCount": 15
}
}
}

Related

How to handle cancellation token in azure service bus topic receiver?

I have a scenario in which I am calling RegisterMessageHandler of SubscriptionClient class of Azure Service Bus library.
Basically I am using trigger based approach while receiving the messages from Service Bus in one of my services in Service Fabric Environment as a stateless service.
So I am not closing the subscriptionClient object immediately, rather I am keeping it open for the lifetime of the Service so that it keeps on receiving the message from azure service bus topics.
And when the service needs to shut down(due to some reasons), I want to handle the cancellation token being passed into the service of Service Fabric.
My question is how can I handle the cancellation token in the RegisterMessageHandler method which gets called whenever a new message is received?
Also I want to handle the closing of the Subscription client "Gracefully", i.e I want that if a message is already being processed, then I want that message to get processed completely and then I want to close the connection.
Below is the code I am using.
Currently We are following the below approach:
1. Locking the process of the message using semaphore lock and releasing the lock in finally block.
2. Calling the cancellationToken.Register method to handle cancellation token whenever cancellation is done. Releasing the lock in the Register Method.
public class AzureServiceBusReceiver
{
private SubscriptionClient subscriptionClient;
private static Semaphore semaphoreLock;
public AzureServiceBusReceiver(ServiceBusReceiverSettings settings)
{
semaphoreLock = new Semaphore(1, 1);
subscriptionClient = new SubscriptionClient(
settings.ConnectionString, settings.TopicName, settings.SubscriptionName, ReceiveMode.PeekLock);
}
public void Receive(
CancellationToken cancellationToken)
{
var options = new MessageHandlerOptions(e =>
{
return Task.CompletedTask;
})
{
AutoComplete = false,
};
subscriptionClient.RegisterMessageHandler(
async (message, token) =>
{
semaphoreLock.WaitOne();
if (subscriptionClient.IsClosedOrClosing)
return;
CancellationToken combinedToken = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, token).Token;
try
{
// message processing logic
}
catch (Exception ex)
{
await subscriptionClient.DeadLetterAsync(message.SystemProperties.LockToken);
}
finally
{
semaphoreLock.Release();
}
}, options);
cancellationToken.Register(() =>
{
semaphoreLock.WaitOne();
if (!subscriptionClient.IsClosedOrClosing)
subscriptionClient.CloseAsync().GetAwaiter().GetResult();
semaphoreLock.Release();
return;
});
}
}
Implement the message client as ICommunicationListener, so when the service is closed, you can block the call until message processing is complete.
Don't use a static Semaphore, so you can safely reuse the code within your projects.
Here is an example of how you can do this.
And here's the Nuget package created by that code.
And feel free to contribute!

Azure Webjob/function Retry count

In a webjob with a trigger on a blob, I want to manage my retry count, because I need to do something at the last retry count, if I do just a try-catch I will lose my retry mechanism, and I can't wait for message to go to poison, because in the blob triggers all the messages of all the jobs listening to blobs goes to the same queue and I can't know where the poison messages come until I read all the queue !
so any idea (if it is possible) to get this retry count?
You can control the maximum number of retries via the maxDequeueCount setting in the "queues" config. That affects blob functions is because behind the scenes a control queue is used for dispatching blobs to your functions.
And the below is the sample code to configure queue, in there you will be able to configure the maxDequeueCount and the default number should be 5. Here is the doc link:Queue storage trigger configuration.
static void Main()
{
var builder = new HostBuilder();
builder.ConfigureWebJobs(b =>
{
b.AddAzureStorageCoreServices();
b.AddAzureStorage(a => {
a.BatchSize = 8;
a.NewBatchThreshold = 4;
a.MaxDequeueCount = 4;
a.MaxPollingInterval = TimeSpan.FromSeconds(15);
});
});
var host = builder.Build();
using (host)
{
host.Run();
}
}

Masstransit not creating Error queue for Azure Function event subscriber

We followed this example (http://masstransit-project.com/MassTransit/usage/azure-functions.html) to try to set up Azure Functions as Azure Service Bus event (topic) subscribers using MassTransit (for .Net CORE 2.1, Azure Functions 2.0).
When using Azure Webjobs this is as simple as using RabbitMQ, configure the publisher, let the subscriber configure and set up its queue, and have Masstransit automatically create one topic per event, redirect to queue and to "queue_error" after all retries have failed. You do not have to setup anything manually.
But with Azure Functions we seem to manually (through Service Bus Explorer or ARM templates) have to add the subscribers to the topic (which is created by the publisher on the first event it publishes) and the queues as well (though these don't even seem to be necessary, the events are handled directly by the consuming Azure Function topic subscribers.).
Maybe we are doing something wrong, I cannot see from the docs that MT will not, as it normally does, set up the subscriber andd creating queues when using Azure Functions. But it works, except for when the consumer throws an exception and after all setup retries have been executed. We simply do not get the event in the deadletter queue and the normally MT-generated error queue does not even get generated.
So how do we get MT to create the error queues, and MOVE the failed events there?
Our code:
[FunctionName("OrderShippedConsumer")]
public static Task OrderShippedConsumer(
[ServiceBusTrigger("xyz.events.order/iordershipped", "ordershippedconsumer-queue", Connection = "AzureServiceBus")] Message message,
IBinder binder,
ILogger logger,
CancellationToken cancellationToken,
ExecutionContext context)
{
var config = CreateConfig(context);
var handler = Bus.Factory.CreateBrokeredMessageReceiver(binder, cfg =>
{
var serviceBusEndpoint = Parse.ConnectionString(config["AzureServiceBus"])["Endpoint"];
cfg.CancellationToken = cancellationToken;
cfg.SetLog(logger);
cfg.InputAddress = new Uri($"{serviceBusEndpoint}{QueueName}");
cfg.UseRetry(x => x.Intervals(TimeSpan.FromSeconds(5)));
cfg.Consumer(() => new OrderShippedConsumer(cfg.Log, config));
});
return handler.Handle(message);
}
And the Consumer code:
public OrderShippedConsumer(ILog log, IConfigurationRoot config)
{
this.config = config;
this.log = log;
}
public async Task Consume(ConsumeContext<IOrderShipped> context)
{
// Handle the event
}
}

Queue messages that are moved to Poison Queue still show as queue count, but stay hidden

I am testing the Poison message handling of the Webjob that I am building.
Everything seems to be working as expected except, one strange thing:
When a message is moved to the “-poison” queue, its ghost seems to remain hidden (invisible) in the main job queue. That means if I have 6 poison messages moved to the “-poison” queue, storage explorer shows “Showing 0 of 6 messages in queue”. I can not see the 6 hidden messages in the Storage Explorer.
I tried to delete the job queue and recreating it, but the strange issue still happening after I run my tests. Storage explorer shows “Showing 0 of 6 messages in queue”.
What is happening behind the scene?
Update 1
I did some investigation and I think WebJob SDK does not delete the poison message.
I went through WebJob SDK source code and I think this line of code is not being executed for some reason:
https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L119
Here is my Function that can help reproducing the issue:
public class Functions
{
public static void ProcessQueueMessage([QueueTrigger("%QueueName%")] string message, TextWriter log)
{
if (message.Contains("Break"))
{
throw new Exception($"Error while processing message {message}");
}
log.WriteLine($"Processed message {message}");
}
}
Update 2
Here is the WebJob SDK I am using:
As far as I know, the azure storage SDK 8.+ is not work well with the Azure webjobs SDK2.0 (related issue).
If you use storage SDK 8.+ the poison messages stay undeleted-but-invisible.
Workaround method is using the low azure storage SDK 7.2.1.
It will work well.
And this issue will be solved in the future SDK version.
I have the same problem.
The problem is when then Message copy in poison queue pass by ref without visibility time https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L145 and when try to delete the message from original queue the service returns 404 not found. Is a problem in azure-webjobs-sdk and the solution is to make this change
await AddMessageAndCreateIfNotExistsAsync(poisonQueue, new CloudQueueMessage(message.AsString), cancellationToken);
in https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs#L145
we wait new version with this fix
Custom solution
To solve this create your own CustomProcessor and in CopyMessageToPoisonQueueAsync function create new CloudMessage from original to pass in poison queue, see example below.
var config = new JobHostConfiguration
config.Queues.QueueProcessorFactory = new CustomQueueProcessorFactory();
public QueueProcessor Create(QueueProcessorFactoryContext context)
{
// demonstrates how the Queue.ServiceClient options can be configured
context.Queue.ServiceClient.DefaultRequestOptions.ServerTimeout = TimeSpan.FromSeconds(30);
// demonstrates how queue options can be customized
context.Queue.EncodeMessage = true;
// return the custom queue processor
return new CustomQueueProcessor(context);
}
/// <summary>
/// Custom QueueProcessor demonstrating some of the virtuals that can be overridden
/// to customize queue processing.
/// </summary>
private class CustomQueueProcessor : QueueProcessor
{
private QueueProcessorFactoryContext _context;
public CustomQueueProcessor(QueueProcessorFactoryContext context)
: base(context)
{
_context = context;
}
public override async Task CompleteProcessingMessageAsync(CloudQueueMessage message, FunctionResult result, CancellationToken cancellationToken)
{
await base.CompleteProcessingMessageAsync(message, result, cancellationToken);
}
protected override async Task CopyMessageToPoisonQueueAsync(CloudQueueMessage message, CloudQueue poisonQueue, CancellationToken cancellationToken)
{
var msg = new CloudQueueMessage(message.AsString);
await base.CopyMessageToPoisonQueueAsync(msg, poisonQueue, cancellationToken);
}
protected override void OnMessageAddedToPoisonQueue(PoisonMessageEventArgs e)
{
base.OnMessageAddedToPoisonQueue(e);
}
}
For anyone out there still having this issue. This should be fixed since 2.1.0-beta1-10851. The downside is that there is currently no stable released version of 2.1.0 yet.

Triggering WebJob Method Based on Message Property

I have an Azure WebJobs project which handles a number of time-consuming tasks triggered by website actions. It works fine.
But the mapping from message to method call uses magic strings:
public class SomeClass
{
public async Task ProcessMessage(
[ QueueTrigger( "%" + nameof( ContainerQueueConstants.FilteredVoterFiles ) + "%" ) ] AgencyOutreachMessage
msg,
TextWriter azureLogWriter
)
{
PhaseNames.SetNames( "Exporting Data", "Job Completed" );
await ExecuteFromMessage( msg, azureLogWriter, Launch );
}
}
public class ContainerQueueConstants
{
public const string ImportFile = "import-file";
public const string VoterTraits = "voter-traits";
public const string Voter = "voter";
public const string FilteredVoterFiles = "filtered-voter-files";
}
I'd like to get away from using hard-coded strings for queue names. Ideally, I'd like to be able to route a message to a particular method based on the value of a property contained in the message.
But I'm not sure if that's even possible, at least in the 1.1.x version of the WebJobs SDK.
Suggestions or advice appreciated.
I suggest using N CloudQueue instances to monitor N different Storage Queues. Since you're doing this in a WebJob, you will probably do this as a continuous webjob and have to perform the polling for each queue yourself. You will also have to take responsibility for removing successfully processed messages.
The QueueTriggerAttribute has built-in support for deadlettering. I do not believe that there is automatic deadlettering support if you do not use the QueueTriggerAttribute.

Resources