Azure Storage Queue performance - azure

We are migrating a transaction-processing service which was processing messages from MSMQ and storing transacitons in a SQLServer Database to use the Azure Storage Queue (to store the id's of the messages and placing the actual messages in the Azure Storage Blob).
We should at least be able to process 200.000 messages per hour, but at the moment we barely reach 50.000 messages per hour.
Our application requests batches of 250 messages from the Queue (which now takes about 2 seconds to get the id's from the azure queue and about 5 seconds to get the actual data from the azure blob storage) and we're storing this data in one time into the database using a stored procedure accepting a datatable.
Our service also resides in Azure on a virtual machine, and we use the nuget-libraries Azure.Storage.Queues and Azure.Storage.Blobs suggested by Microsoft to access the Azure Storage queue and blob storage.
Does anyone have suggestions how to improve the speed of reading messages from the Azure Queue and then retrieving the data from the Azure Blob?
var managedIdentity = new ManagedIdentityCredential();
UriBuilder fullUri = new UriBuilder()
{
Scheme = "https",
Host = string.Format("{0}.queue.core.windows.net",appSettings.StorageAccount),
Path = string.Format("{0}", appSettings.QueueName),
};
queue = new QueueClient(fullUri.Uri, managedIdentity);
queue.CreateIfNotExists();
...
var result = await queue.ReceiveMessagesAsync(1);
...
UriBuilder fullUri = new UriBuilder()
{
Scheme = "https",
Host = string.Format("{0}.blob.core.windows.net", storageAccount),
Path = string.Format("{0}", containerName),
};
_blobContainerClient = new BlobContainerClient(fullUri.Uri, managedIdentity);
_blobContainerClient.CreateIfNotExists();
...
public async Task<BlobMessage> GetBlobByNameAsync(string blobName)
{
Ensure.That(blobName).IsNotNullOrEmpty();
var blobClient = _blobContainerClient.GetBlobClient(blobName);
if (!blobClient.Exists())
{
_log.Error($"Blob {blobName} not found.");
throw new InfrastructureException($"Blob {blobName} not found.");
}
BlobDownloadInfo download = await blobClient.DownloadAsync();
return new BlobMessage
{
BlobName = blobClient.Name,
BaseStream = download.Content,
Content = await GetBlobContentAsync(download)
};
}
Thanks,
Vincent.

Based on the code you posted, I can suggest two improvements:
Receive 32 messages at a time instead of 1: Currently you're getting just one message at a time (var result = await queue.ReceiveMessagesAsync(1);). You can receive a maximum of 32 messages from the top of the queue. Just change the code to var result = await queue.ReceiveMessagesAsync(32); to get 32 messages. This will save you 31 trips to storage service and that should lead to some performance improvements.
Don't try to create blob container every time: Currently you're trying to create a blob container every time you process a message (_blobContainerClient.CreateIfNotExists();). It is really unnecessary. With fetching 32 messages, you're reducing this method call by 31 times however you can just move this code to your application startup so that you only call it once during your application lifecycle.

Related

Azure Functions + EventHub: why batch latency grows up constantly?

I have next chart:
As you can see my batch latency grows up and count of outgoing messages grows down.
Inside of the function I do append to a blob storage. But blob metrics says everything is ok.
What could be causing the ever-increasing latency?
Function implementation:
const parsedEvents = eventHubMessages.map((event) => {
try {
return JSON.parse(event);
} catch (error) {
context.log(`Error: cannot parse next event: ${event}`);
return {};
}
});
for (const event of parsedEvents) {
const { id } = event;
const data = {
data: 'data',
};
const filename = `${id}.log`;
await blob.append(filename, JSON.stringify(data));
}
Blob append is a instance of a class and looks like:
class AzureStorage {
constructor(config) {
this.config = config;
this.blobServiceClient = BlobServiceClient.fromConnectionString(this.config.storageConnectionString);
this.containerClient = this.blobServiceClient.getContainerClient(this.config.containerName);
}
async append(filename, data) {
const client = this.containerClient.getAppendBlobClient(filename);
await client.createIfNotExists();
await client.appendBlock(data, data.length);
}
}
Another one chart:
Update:
So, my problem was in the blob storage. I did client.createIfNotExists(); and this is the root of the problem. I rewrite my code next way:
I call client.appendBlock
I catch it and if there is an error, then I do client.create(); and then client.appendBlock one more time.
Thanks #JesseSquire for your helpful suggestion. Adding few more troubleshooting points that helps to find the root cause of latency issues in Azure Functions integrated with Event Hubs.
Also check if versioning is enabled on the Storage account which may slows down considerably.
Make sure you can scale out to at least the number of partitions your Event Hub has by checking your Function scaling setup.
Use Logging/Application Insights feature to measure the execution time for blob append in order to check bottlenecks in your code.
Telemetry Logs helps you to find your function performance data & metrics for avoiding the integrated services like Event Hub batch events latency issues, runtime exceptions, etc.
Dedicated storage account is better because of checkpointing, event hub-triggered functions could experience a large volume of storage transactions.
Refer to the MS Doc of Azure Functions Performance for Event Hubs.

Unable to configure Azure Event Hub Producer

I am trying a sample code of Azure Event Hub Producer and trying to send some message to Azure Event Hub.
The eventhub and its policy is correctly configured for sending and listening messages. I am using Dotnet core 3.1 console application. However, the code doesn't move beyond CreateBatchAsync() call. I tried debugging and the breakpoint doesn't go to next line. Tried Try-catch-finally and still no progress. Please guide what I am doing wrong here. The Event hub on Azure is shows some number of successful incoming requests.
class Program
{
private const string connectionString = "<event_hub_connection_string>";
private const string eventHubName = "<event_hub_name>";
static async Task Main()
{
// Create a producer client that you can use to send events to an event hub
await using (var producerClient = new EventHubProducerClient(connectionString, eventHubName))
{
// Create a batch of events
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("First event")));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Second event")));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Third event")));
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
Console.WriteLine("A batch of 3 events has been published.");
}
}
}
The call to CreateBatchAsync would be the first need to create a connection to Event Hubs. This indicates that you're likely experiencing a connectivity or authorization issue.
In the default configuration you're using, the default network timeout is 60 seconds and up to 3 retries are possible, with some back-off between them.
Because of this, a failure to connect or authorize may take up to roughly 5 minutes before it manifests. That said, the majority of connection errors are not eligible for retries, so the failure would normally surface after roughly 1 minute.
To aid in your debugging, I'd suggest tweaking the default retry policy to speed things up and surface an exception more quickly so that you have the information needed to troubleshoot and make adjustments. The options to do so are discussed in this sample and would look something like:
var connectionString = "<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>";
var eventHubName = "<< NAME OF THE EVENT HUB >>";
var options = new EventHubProducerClientOptions
{
RetryOptions = new EventHubsRetryOptions
{
// Allow the network operation only 15 seconds to complete.
TryTimeout = TimeSpan.FromSeconds(15),
// Turn off retries
MaximumRetries = 0,
Mode = EventHubsRetryMode.Fixed,
Delay = TimeSpan.FromMilliseconds(10),
MaximumDelay = TimeSpan.FromSeconds(1)
}
};
await using var producer = new EventHubProducerClient(
connectionString,
eventHubName,
options);

How dispose connections to services such as Azure Storage

My function stores data up Azure Data Lakta Storage Gen 1.
But I got bug An error occurred while sending the request.
When I investigated,I knowed that my connection in azure function overcome 8k then it's broken.
Here is my code(Append to file Azure DataLakeStorage Gen 1)
//This for authorizing azure data lake storage gen 1
await InitADLInfo(adlsAccountName);
DataLakeStoreFileSystemManagementClient _adlsFileSystemClient;
//Here is my code to append data lake storage gen 1
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(buffer)))
{
await _adlsFileSystemClient.FileSystem.AppendAsync(_adlsAccountName, path, stream);
}
How to dispose that when every append ends.
I try to dispose
_adlsFileSystemClient.Dispose();
But it didn't dispose anything.My connection will up.
I read this
https://www.troyhunt.com/breaking-azure-functions-with-too-many-connections/1
and I have made connection down.Just use DO NOT create a new client with every function invocation.
Example Code :
// Create a single, static HttpClient
private static HttpClient httpClient = new HttpClient();
public static async Task Run(string input)
{
var response = await httpClient.GetAsync("http://example.com");
// Rest of function
}

Unable to find entities from table storage after inserting batches of 100

Issue:
We currently have two azure consumption plan functions, each receiving service bus queue messages as input.
The first functions call SQL Azure with a stored proc, gets 500k+ records back, saves those records in batches of a 100 to Azure table storage with each batch having a unique partition key. After that's done it then creates a new queue message for next function to read batch and process it.
Everything works fine when the second function is not running warm and still needs to warm up. If the second function is running in memory, and it receives the queue message, we do a partition key lookup against the table storage, and sometimes it seems the data coming back is empty.
Code that inserts batches into table storage:
foreach (var entry in partitionKeyGroupinng)
{
var operation = new TableBatchOperation();
entry.ToList().ForEach(operation.Insert);
if (operation.Any())
{
await CloudTable.ExecuteBatchAsync(operation);
}
}
This is within an async task function in a shared assembly referenced by all functions.
Code to read out from table storage as partition key lookup:
TableContinuationToken continuationToken = null;
var query = BuildQuery(partitionKey);
var allItems = new List<T>();
do
{
var items = await CloudTable.ExecuteQuerySegmentedAsync(query, continuationToken);
continuationToken = items.ContinuationToken;
allItems.AddRange(items);
} while (continuationToken != null);
return allItems;
Code that calls that to lookup by partition key:
var batchedNotifications = await _tableStorageOperations.GetByPartitionKeyAsync($"{trackingId.ToString()}_{batchNumber}");
I reckon its to do with the batch still being written and available to other clients but don't know if that's the case? What would be the best way to handle this with the function processing and eventual consistency?
I have disabled the following on table client:
tableServicePoint.UseNagleAlgorithm = false;
tableServicePoint.Expect100Continue = false;
tableServicePoint.ConnectionLimit = 300;
If I also look up that same partition key in storage explorer as the event happens, I can see the batch so it returns values? I thought to make use of EGT with the batching would ensure this is written and available as soon as possible, because the method async Task WriteBatch shouldn't finish before it has finished writing the batch, however, don't know how long the back of table storage takes to write that to a physical partition and then make it available. I have also batched all the service bus queue messages up before sending them to add some delay to the second function.
Question:
How do we deal with this delay in accessing these records out of table storage between two functions using service bus queues?

Same Azure topic is processed multiple times

We have a job hosted in an azure website, the job reads entries from a topic subscription. Everything works fine when we only have one instance to host the website. Once we scale out to more than one instance we observe the message is processed as many times as instances we have. Each instance points to the same subscription. From what we read, once the item is read, it won't be available for any other process. The duplicated processing is happening inside the same instance, meaning that if we have two instances, the item is processed twice in one of the instances, it is not splitted.
What can be possible be wrong in the way we are doing things?
This is how we proceed to configure the connection to the queue, if the subscription does not exists, it is created:
var serviceBusConfig = new ServiceBusConfiguration
{
ConnectionString = transactionsBusConnectionString
};
config.UseServiceBus(serviceBusConfig);
var allRule1 = new RuleDescription
{
Name = "All",
Filter = new TrueFilter()
};
SetupSubscription(transactionsBusConnectionString,"topic1", "subscription1", allRule1);
private static void SetupSubscription(string busConnectionString, string topicNameKey, string subscriptionNameKey, RuleDescription newRule)
{
var namespaceManager =
NamespaceManager.CreateFromConnectionString(busConnectionString);
var topicName = ConfigurationManager.AppSettings[topicNameKey];
var subscriptionName = ConfigurationManager.AppSettings[subscriptionNameKey];
if (!namespaceManager.SubscriptionExists(topicName, subscriptionName))
{
namespaceManager.CreateSubscription(topicName, subscriptionName);
}
var subscriptionClient = SubscriptionClient.CreateFromConnectionString(busConnectionString, topicName, subscriptionName);
var rules = namespaceManager.GetRules(topicName, subscriptionName);
foreach (var rule in rules)
{
subscriptionClient.RemoveRule(rule.Name);
}
subscriptionClient.AddRule(newRule);
rules = namespaceManager.GetRules(topicName, subscriptionName);
rules.ToString();
}
Example of the code that process the topic item:
public void SendInAppNotification(
[ServiceBusTrigger("%eventsTopicName%", "%SubsInAppNotifications%"), ServiceBusAccount("OutputServiceBus")] Notification message)
{
this.valueCalculator.AddInAppNotification(message);
}
This method is inside a Function static class, I'm using azure web job sdk.
Whenever the azure web site is scaled to more than one instance, all the instances share the same configuration.
It sounds like you're creating a new subscription each time your new instance runs, rather than hooking into an existing one. Topics are designed to allow multiple subscribers to attach in that way as well - usually though each subscriber has a different purpose, so they each see a copy of the message.
I cant verify this from your code snippet but that's my guess - are the config files identical? You should add some trace output to see if your processes are calling CreateSubscription() each time they run.
I think I can access the message id, I'm using azure web job sdk but I think I can find a way to get it. Let me check it and will let you know.

Resources