How to manage centralized values in a sharded environment - azure

I have an ASP.NET app being developed for Windows Azure. It's been deemed necessary that we use sharding for the DB to improve write times since the app is very write heavy but the data is easily isolated. However, I need to keep track of a few central variables across all instances, and I'm not sure the best place to store that info. What are my options?
Requirements:
Must be durable, can survive instance reboots
Must be synchronized. It's incredibly important to avoid conflicting updates or at least throw an exception in such cases, rather than overwriting values or failing silently.
Must be reasonably fast (2000+ read/writes per second
I thought about writing a separate component to run on a worker role that simply reads/writes the values in memory and flushes them to disk every so often, but I figure there's got to be something already written for that purpose that I can appropriate in Windows Azure.
I think what I'm looking for is a system like Apache ZooKeeper, but I dont' want to have to deal with installing the JRE during the worker role startup and all that jazz.
Edit: Based on the suggestion below, I'm trying to use Azure Table Storage using the following code:
var context = table.ServiceClient.GetTableServiceContext();
var item = context.CreateQuery<OfferDataItemTableEntity>(table.Name)
.Where(x => x.PartitionKey == Name).FirstOrDefault();
if (item == null)
{
item = new OfferDataItemTableEntity(Name);
context.AddObject(table.Name, item);
}
if (item.Allocated < Quantity)
{
allocated = ++item.Allocated;
context.UpdateObject(item);
context.SaveChanges();
return true;
}
However, the context.UpdateObject(item) call fails with The context is not currently tracking the entity. Doesn't querying the context for the item initially add it to the context tracking mechanism?

Have you looked into SQL Azure Federations? It seems like exactly what you're looking for:
sharding for SQL Azure.
Here are a few links to read:
http://msdn.microsoft.com/en-us/library/windowsazure/hh597452.aspx
http://convective.wordpress.com/2012/03/05/introduction-to-sql-azure-federations/
http://searchcloudapplications.techtarget.com/tip/Tips-for-deploying-SQL-Azure-Federations

What you need is Table Storage since it matches all your requirements:
Durable: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
Synchronized: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
It's incredibly important to avoid conflicting updates: Yes, this is possible with the use of ETags
Reasonably fast? Very fast, up to 20,000 entities/messages/blobs per second
Update:
Here is some sample code that uses the new storage SDK (2.0):
var storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
var table = storageAccount.CreateCloudTableClient()
.GetTableReference("Records");
table.CreateIfNotExists();
// Add item.
table.Execute(TableOperation.Insert(new MyEntity() { PartitionKey = "", RowKey ="123456", Customer = "Sandrino" }));
var user1record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
var user2record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
user1record.Customer = "Steve";
table.Execute(TableOperation.Replace(user1record));
user2record.Customer = "John";
table.Execute(TableOperation.Replace(user2record));
First it adds the item 123456.
Then I'm simulating 2 users getting that same record (imagine they both opened a page displaying the record).
User 1 is fast and updates the item. This works.
User 2 still had the window open. This means he's working on an old version of the item. He updates the old item and tries to save it. This causes the following exception (this is possible because the SDK matches the ETag):
The remote server returned an error: (412) Precondition Failed.

I ended up with a hybrid cache / table storage solution. All instances track the variable via Azure caching, while the first instance spins up a timer that saves the value to table storage once per second. On startup, the cache variable is initialized with the value saved to table storage, if available.

Related

Cosmos DB Query Intermittent latency

I have a singleton Cosmos DB Client running as a singleton with default options. I'm using a .NET 6.0 WebAPI project, running in an Azure app service with "Always-On" enabled. The App Service and Cosmos Account are in the same region, UE2. The API queries a Cosmos container and returns the result.
I've noticed that the latency of the first query is always slow (4-6 seconds), subsequent queries are much faster (-100ms) but also sometimes have random high latency. This is not a cold start scenario, the client has already been initialized by the DI pipeline. I'm also not being rate limited.
Here is my singleton client
public CosmosDbService(IConfiguration configuration)
{
var account = configuration.GetSection("CosmosDb")["Account"];
var key = configuration.GetSection("CosmosDb")["Key"];
var databaseName = configuration.GetSection("CosmosDb")["DatabaseName"];
var containerName = configuration.GetSection("CosmosDb")["Container"];
CosmosClient client = new (account, key);
_myContainer = client.GetContainer(databaseName, containerName);
}
Here is the meat of the query where a Linq query is being passed in:
public class RetrieveCarRepository : IRetrieveCarRepository
{
public async Task<List<CarModel>> RetrieveCars(IQueryable<CarModel> querydef)
{
var query = querydef.ToFeedIterator();
List<CarModel> cars = new ();
while (query.HasMoreResults)
{
var response = await query.ReadNextAsync();
foreach (var car in response)...do a thing
I've been through several Cosmos training videos and cosmos courses but still haven't been able to come to an idea of what is happening.
From the comments.
For query performance using the .NET SDK please see: https://learn.microsoft.com/en-us/azure/cosmos-db/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp#use-local-query-plan-generation
Query Plan generation can affect latency and can be avoided if:
The query is reworked to be on a single partition (instead of cross-partition).
The workload runs on Windows, compiled as x64 and with the Nuget DLLs co-located. Which in turn would leverage local query plan generation through the ServiceInterop.dll
On both cases the Query Plan request should be removed and latency improved.
As a general rule, latency should be investigated on the P99 across 1h to understand how it is impacted. A couple of higher latency requests can always happen.
Keep also in mind that query latency will vary based on the type of query, volume of data to transfer, and number of pages. You can capture the Diagnostics and use: https://learn.microsoft.com/azure/cosmos-db/troubleshoot-dot-net-sdk-slow-request

How to have multiple instances of changefeed listeners get the same message: Java

We are using Cosmos Changefeed listeners to update the edge cache in ephemeral java services. That means, all the arbitrary number of instances should receive every changefeed. We used UUID as the "hostname" but not all instances are getting the changefeed. I read somewhere there is leasePrefix. Will that work? If so how to do that on Java side of things?
Yes, Lease prefix will help you in this case. A scenario where you want to do multiple things whenever there is a new event in a particular Azure Cosmos container. If actions you want to trigger, are independent from one another, the ideal solution would be to create one listener for Cosmos DB per action you want to do, all listening for changes on the same Azure Cosmos container.
Given the requirements of the listeners for Cosmos DB, we need a second container to store state, also called, the leases container. Does this mean that you need a separate leases container for each Azure Function?
Here, you have two options:
Create one leases container per Listener: This approach can translate into additional costs, unless you're using a shared throughput database. Remember, that the minimum throughput at the container level is 400 Request Units, and in the case of the leases container, it is only being used to checkpoint the progress and maintain state.
Have one lease container and share it for all your Listeners: This second option makes better use of the provisioned Request Units on the container, as it enables multiple Listeners to share and use the same provisioned throughput.
Here is an example of Function App to implement this in Java Language: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-cosmosdb-v2-trigger?tabs=java
Code for quick reference:
#FunctionName("cosmosDBMonitor")
public void cosmosDbProcessor(
#CosmosDBTrigger(name = "items",
databaseName = "ToDoList",
collectionName = "Items",
leaseCollectionName = "leases",
leaseCollectionPrefix = "prefix",
createLeaseCollectionIfNotExists = true,
connectionStringSetting = "AzureCosmosDBConnection") String[] items,
final ExecutionContext context ) {
context.getLogger().info(items.length + "item(s) is/are changed.");
}

Checking connection to Azure Service Bus

I have some code dependent of Azure Service Bus. I've created an endpoint that checks the availability of my Azure Service Bus topic using the following code:
var connectionString = CloudConfigurationManager.GetSetting("servicebusconnectionstring");
var manager = NamespaceManager.CreateFromConnectionString(connectionString);
var sub = manager.GetSubscription("mytopic", "mysubscription");
var count = sub.MessageCount;
This actually works, but I have two questions (since I'm constantly experiencing timeouts using this code).
Question 1: Is there an easier/better way of checking Service Bus connectivity from C#?
Question 2: When using the code above, which instances should I configure as singleton in my IoC container? I'm suspecting creating all instances every time I ping this endpoint to cause the timeout, since I don't see problems in my other endpoints where I re-use a TopicClient.
Getting MessageCount is potentially an expensive operation, especially if the value is high.
You could run a simple operation like a check whether the topic exists:
var ns = NamespaceManager.CreateFromConnectionString("...");
ns.TopicExists("mytopic");
which will throw an exception (probably MessagingCommunicationException) if communication to Service Bus fails.
It's ok to reuse NamespaceManager between requests, so you can make it singleton. Not sure if that brings any measurable performance benefit though.

Setup webjob ServiceBusTriggers or queue names at runtime (without hard-coded attributes)?

Is there any way to configure triggers without attributes? I cannot know the queue names ahead of time.
Let me explain my scenario here.. I have one service bus queue, and for various reasons (complicated duplicate-suppression business logic), the queue messages have to be processed one at a time, so I have ServiceBusConfiguration.OnMessageOptions.MaxConcurrentCalls set to 1. So processing a message holds up the whole queue until it is finished. Needless to say, this is suboptimal.
This 'one at a time' policy isn't so simple. The messages could be processed in parallel, they just have to be divided into groups (based on a field in message), say A and B. Group A can process its messages one at a time, and group B can process its own one at a time, etc. A and B are processed in parallel, all is good.
So I can create a queue for each group, A, B, C, ... etc. There are about 50 groups, so 50 queues.
I can create a queue for each, but how to make this work with the Azure Webjobs SDK? I don't want to copy-paste a method for each queue with a different ServiceBusTrigger for the SDK to discover, just to enforce one-at-a-time per queue/group, then update the code with another copy-paste whenever another group is needed. Fetching a list of queues at startup and tying to the function is preferable.
I have looked around and I don't see any way to do what I want. The ITypeLocator interface is pretty hard-set to look for attributes. I could probably abuse the INameResolver, but it seems like I'd still have to have a bunch of near-duplicate methods around. Could I somehow create what the SDK is looking for at startup/runtime?
(To be clear, I know how to use INameResolver to get queue name as at How to set Azure WebJob queue name at runtime? but though similar this isn't my problem. I want to setup triggers for multiple queues at startup for the same function to get the one-at-a-time per queue processing, without using the trigger attribute 50 times repeatedly. I figured I'd ask again since the SDK repo is fairly active and it's been a year..).
Or am I going about this all wrong? Being dumb? Missing something? Any advice on this dilemma would be welcome.
The Azure Webjob Host discovers and indexes the functions with the ServiceBusTrigger attribute when it starts. So there is no way to set up the queues to trigger at the runtime.
The simpler solution for you is to create a long time running job and implement it manually:
public class Program
{
private static void Main()
{
var host = new JobHost();
host.CallAsync(typeof(Program).GetMethod("Process"));
host.RunAndBlock();
}
[NoAutomaticTriggerAttribute]
public static async Task Process(TextWriter log, CancellationToken token)
{
var connectionString = "myconnectionstring";
// You can also get the queue name from app settings or azure table ??
var queueNames = new[] {"queueA", "queueA" };
var messagingFactory = MessagingFactory.CreateFromConnectionString(connectionString);
foreach (var queueName in queueNames)
{
var receiver = messagingFactory.CreateMessageReceiver(queueName);
receiver.OnMessage(message =>
{
try
{
// do something
....
// Complete the message
message.Complete();
}
catch (Exception ex)
{
// Log the error
log.WriteLine(ex.ToString());
// Abandon the message so that it can be retry.
message.Abandon();
}
}, new OnMessageOptions() { MaxConcurrentCalls = 1});
}
// await until the job stop or restart
await Task.Delay(Timeout.InfiniteTimeSpan, token);
}
}
Otherwise, if you don't want to deal with multiple queues, you can have a look at azure servicebus topic/subscription and create SqlFilter to send your message to the right subscription.
Another option could be to create your own trigger: The azure webjob SDK provides extensibility points to create your own trigger binding :
Binding Extensions Overview
Good Luck !
Based on my understanding, your needs seems to be building a message batch system in parallel. The #Thomas solution is good, but I think Azure Batch service with Table storage may be better and could be instead of the complex solution of ServiceBus queue + WebJobs with a trigger.
Using Azure Batch with Table storage, you can control the task creation and execute the task in parallel and at scale, even monitor these tasks, please refer to the tutorial to know how to.

Azure Table Storage Performance

How fast should I be expecting the performance of Azure Storage to be? I'm seeing ~100ms on basic operations like getEntity, updateEntity, etc.
This guy seems to be getting 4ms which makes it look like something is really wrong here!
http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html
I'm using the azure-table-node npm plugin.
https://www.npmjs.org/package/azure-table-node
A simple getEntity call is taking ~90ms:
exports.get = function(table, pk, rk, callback) {
var start = process.hrtime();
client().getEntity(table, pk, rk, function(err, entity) {
console.log(prettyhr(process.hrtime(start)));
...
The azure-storage module appears to be even slower:
https://www.npmjs.org/package/azure-storage
var start = process.hrtime();
azureClient.retrieveEntity(table, pk, rk, function(err, entity) {
console.log('retrieveEntity',prettyhr(process.hrtime(start)));
...
retrieveEntity 174 ms
Well, it really depends from where you are accessing the Azure Storage.
Are you trying to access the storage from the same DataCenter or just from somewhere on the Internet?
If your code is not running in the same DataCenter then it's just a matter of network latency to perform an HttpRequest to DataCenter where you have your storage running. So this can vary a lot, depending from where you're trying to access the DC and in which region your DC is located. (to make an idea you can check the latency from your pc for example to all Azure DCs Storage here: http://azurespeedtest.azurewebsites.net/
If you're code is running in the same DC, everything should be pretty fast for simple operations such as the ones you are trying out, probably just a few miliseconds.

Resources