How to avoid concurrency issues when scaling writes horizontally?

How to avoid concurrency issues when scaling writes horizontally? - azure

Assume there is a worker service that receives messages from a queue, reads the product with the specified Id from a document database, applies some manipulation logic based on the message, and finally writes the updated product back to the database (a).
This work can be safely done in parallel when dealing with different products, so we can scale horizontally (b). However, if more than one service instance works on the same product, we might end up with concurrency issues, or concurrency exceptions from the database, in which case we should apply some retry logic (and still the retry might fail again and so on).
Question: How do we avoid this? Is there a way I can ensure two instances are not working on the same product?
Example/Use case: An online store has a great sale on productA, productB and productC that ends in an hour and hundreds of customers are buying. For each purchase, a message is enqueued (productId, numberOfItems, price). Goal: How can we run three instances of our worker service and make sure that all messages for productA will end up in instanceA, productB to instanceB and productC to instanceC (resulting in no concurrency issues)?
Notes: My service is written in C#, hosted on Azure as a Worker Role, I use Azure Queues for messaging, and I'm thinking to use Mongo for storage. Also, the Entity IDs are GUID.
It's more about the technique/design, so if you use different tools to solve the problem I'm still interested.

Any solution attempting to divide the load upon different items in the same collection (like orders) are doomed to fail. The reason is that if you got a high rate of transactions flowing you'll have to start doing one of the following things:
let nodes to talk each other (hey guys, are anyone working with this?)
Divide the ID generation into segments (node a creates ID 1-1000, node B 1001-1999) etc and then just let them deal with their own segment
dynamically divide a collection into segments (and let each node handle a segment.
so what's wrong with those approaches?
The first approach is simply replicating transactions in a database. Unless you can spend a large amount of time optimizing the strategy it's better to rely on transactions.
The second two options will decrease performance as you have to dynamically route messages upon ids and also change the strategy at run-time to also include newly inserted messages. It will fail eventually.
Solutions
Here are two solutions that you can also combine.
Retry automatically
Instead you have an entry point somewhere that reads from the message queue.
In it you have something like this:
while (true)
{
var message = queue.Read();
Process(message);
}
What you could do instead to get very simple fault tolerance is to retry upon failure:
while (true)
{
for (i = 0; i < 3; i++)
{
try
{
var message = queue.Read();
Process(message);
break; //exit for loop
}
catch (Exception ex)
{
//log
//no throw = for loop runs the next attempt
}
}
}
You could of course just catch db exceptions (or rather transaction failures) to just replay those messages.
Micro services
I know, Micro service is a buzz word. But in this case it's a great solution. Instead of having a monolithic core which processes all messages, divide the application in smaller parts. Or in your case just deactivate the processing of certain types of messages.
If you have five nodes running your application you can make sure that Node A receives messages related to orders, node B receives messages related to shipping etc.
By doing so you can still horizontally scale your application, you get no conflicts and it requires little effort (a few more message queues and reconfigure each node).

For this kind of a thing I use blob leases. Basically, I create a blob with the ID of an entity in some known storage account. When worker 1 picks up the entity, it tries to acquire a lease on the blob (and create the blob itself, if it doesn't exist). If it is successful in doing both, then I allow the processing of the message to occur. Always release the lease afterwards.
If I am not successfull, I dump the message back onto the queue
I follow the apporach originally described by Steve Marx here http://blog.smarx.com/posts/managing-concurrency-in-windows-azure-with-leases although tweaked to use new Storage Libraries
Edit after comments:
If you have a potentially high rate of messages all talking to the same entity (as your commend implies), I would redesign your approach somewhere.. either entity structure, or messaging structure.
For example: consider CQRS design pattern and store changes from processing of every message independently. Whereby, product entity is now an aggregate of all changes done to the entity by various workers, sequentially re-applied and rehydrated into a single object

If you want to always have the database up to date and always consistent with the already processed units then you have several updates on the same mutable entity.
In order to comply with this you need to serialize the updates for the same entity. Either you do this by partitioning your data at producers, either you accumulate the events for the entity on the same queue, either you lock the entity in the worker using an distributed lock or a lock at the database level.
You could use an actor model (in java/scala world using akka) that is creating a message queue for each entity or group of entities that process them serially.
UPDATED
You can try an akka port to .net and here.
Here you can find a nice tutorial with samples about using akka in scala.
But for general principles you should search more about [actor model]. It has drawbacks nevertheless.
In the end pertains to partition your data and ability to create a unique specialized worker(that could be reused and/or restarted in case of failure) for a specific entity.

I assume you have a means to safely access the product queue across all worker services. Given that, one simple way to avoid conflict could be using global queues per product next to the main queue
// Queue[X] is the queue for product X
// QueueMain is the main queue
DoWork(ProductType X)
{
if (Queue[X].empty())
{
product = QueueMain().pop()
if (product.type != X)
{
Queue[product.type].push(product)
return;
}
}else
{
product = Queue[X].pop()
}
//process product...
}
The access to queues need to be atomic

You should use session enabled service bus queue for ordering and concurrency.

1) Every high scale data solution that I can think of has something built in to handle precisely this sort of conflict. The details will depend on your final choice for data storage. In the case of a traditional relational database, this comes baked in without any add'l work on your part. Refer to your chosen technology's documentation for appropriate detail.
2) Understand your data model and usage patterns. Design your datastore appropriately. Don't design for scale that you won't have. Optimize for your most common usage patterns.
3) Challenge your assumptions. Do you actually have to mutate the same entity very frequently from multiple roles? Sometimes the answer is yes, but often you can simply create a new entity that's similar to reflect the update. IE, take a journaling/logging approach instead of a single-entity approach. Ultimately high volumes of updates on a single entity will never scale.

Related

What is the best way to rehydrate aggregate roots and their associated entities in an event sourced environment

I have seen information on rehydrating aggregate roots in SO, but I am posting this question because I did not find any information in SO about doing so with in the context of an event sourced framework.
Has a best practice been discovered or developed for how to rehydrate aggregate roots when operating on the command side of an application using the event sourcing and CQRS pattern
OR is this still more of a “preference“ among architects?
I have read through a number of blogs and watched a number of conference presentations on you tube and I seem to get different guidance depending on who I am attending to.
On the one hand, I have found information stating fairly clearly that developers should create aggregates to hydrate themselves using “apply“ methods on events obtained directly from the event store..
On the other hand, I have also seen in several places where presenters and bloggers have recommended rehydrating aggregate roots by submitting a query to the read side of the application. Some have suggested creating specific validation “buckets“ / projections on the read side to facilitate this.
Can anyone help point me in the right direction on discovering if there is a single best practice or if the answer primarily depends upon performance issues or some other issue I am not thinking about?

Hydrating Aggregates in an event sourced framework is a well-understood problem.
On the one hand, I have found information stating fairly clearly that
developers should create aggregates to hydrate themselves using
“apply“ methods on events obtained directly from the event store..
This is the prescribed way of handling it. There are various ways of achieving this, but I would suggest keeping any persistence logic (reading or writing events) outside of your Aggregate. One simple way is to expose a constructor that accepts domain events and then applies those events.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application. Some have
suggested creating specific validation “buckets“ / projections on the
read side to facilitate this.
You can use the concept of snapshots as a way of optimizing your reads. This will create a memoized version of your hydrated Aggregate. You can load this snapshot and then only apply events that were generated since the snapshot was created. In this case, your Aggregate can define a constructor that takes two parameters: an existing state (snapshot) and any remaining domain events that can then be applied to that snapshot.
Snapshots are just an optimization and should be considered as such. You can create a system that does not use snapshots and apply them once read performance becomes a bottleneck.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application
Snapshots are not really part of the read side of the application. Data on the read side exists to satisfy use cases within the application. Those can change based on requirements even if the underlying domain does not change. As such, you shouldn't use read side data in your domain at all.

Event sourcing has developed different styles over the years. I could divide all o those into two big categories:
an event stream represents one entity (an aggregate in case of DDD)
one (partitioned) event stream for a (sub)system
When you deal with one stream per (sub)system, you aren't able to rehydrate the write-side on the fly, it is physically impossible due to the number of events in that stream. Therefore, you would rely on the projected read-side to retrieve the current entity state. As a consequence, this read-side must be fully consistent.
When going with the DDD-flavoured event sourcing, there's a strong consensus in the community how it should be done. The state of the aggregate (not just the root, but the whole aggregate) is restored by the command side before calling the domain model. You always restore using events. When snapshotting is enabled, snapshots are also stored as events in the aggregate snapshot stream, so you read the last one and all events from the snapshot version.
Concerning the Apply thing. You need to clearly separate the function that adds new events to the changes list (what you're going to save) and functions what mutate the aggregate state when events are applied.
The first function is the one called Apply and the second one is often called When. So you call the Apply function in your aggregate code to build up the changelist. The When function is called when restoring the aggregate state from events when you read the stream, and also from the Apply function.
You can find a simplistic example of an event-sourced aggregate in my book repo: https://github.com/alexeyzimarev/ddd-book/blob/master/chapter13/src/Marketplace.Ads.Domain/ClassifiedAds/ClassifiedAd.cs
For example:
public void Publish(UserId userId)
=> Apply(
new V1.ClassifiedAdPublished
{
Id = Id,
ApprovedBy = userId,
OwnerId = OwnerId,
PublishedAt = DateTimeOffset.Now
}
);
And for the When:
protected override void When(object #event)
{
switch (#event)
{
// more code here
case V1.ClassifiedAdPublished e:
ApprovedBy = UserId.FromGuid(e.ApprovedBy);
State = ClassifiedAdState.Active;
break;
// and more here
}
}

How to process Read Model in CQRS

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.

If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.

For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

How are the missing events replayed?

I am trying to learn more about CQRS and Event Sourcing (Event Store).
My understanding is that a message queue/bus is not normally used in this scenario - a message bus can be used to facilitate communication between Microservices, however it is not typically used specifically for CQRS. However, the way I see it at the moment - a message bus would be very useful guaranteeing that the read model is eventually in sync hence eventual consistency e.g. when the server hosting the read model database is brought back online.
I understand that eventual consistency is often acceptable with CQRS. My question is; how does the read side know it is out of sync with the write side? For example, lets say there are 2,000,000 events created in Event Store on a typical day and 1,999,050 are also written to the read store. The remaining 950 events are not written because of a software bug somewhere or because the server hosting the read model is offline for a few secondsetc. How does eventual consistency work here? How does the application know to replay the 950 events that are missing at the end of the day or the x events that were missed because of the downtime ten minutes ago?
I have read questions on here over the last week or so, which talk about messages being replayed from event store e.g. this one: CQRS - Event replay for read side, however none talk about how this is done. Do I need to setup a scheduled task that runs once per day and replays all events that were created since the date the scheduled task last succeeded? Is there a more elegant approach?

I've used two approaches in my projects, depending on the requirements:
Synchronous, in-process Readmodels. After the events are persisted, in the same request lifetime, in the same process, the Readmodels are fed with those events. In case of a Readmodel's failure (bug or catchable error/exception) the error is logged and that Readmodel is just skipped and the next Readmodel is fed with the events and so on. Then follow the Sagas, that may generate commands that generate more events and the cycle is repeated.
I use this approach when the impact of a Readmodel's failure is acceptable by the business, when the readiness of a Readmodel's data is more important than the risk of failure. For example, they wanted the data immediately available in the UI.
The error log should be easily accessible on some admin panel so someone would look at it in case a client reports inconsistency between write/commands and read/query.
This also works if you have your Readmodels coupled to each other, i.e. one Readmodel needs data from another canonical Readmodel. Although this seems bad, it's not, it always depends. There are cases when you trade updater code/logic duplication with resilience.
Asynchronous, in-another-process readmodel updater. This is used when I use total separation of the Readmodel from the other Readmodels, when a Readmodel's failure would not bring the whole read-side down; or when a Readmodel needs another language, different from the monolith. Basically this is a microservice. When something bad happens inside a Readmodel it necessary that some authoritative higher level component is notified, i.e. an Admin is notified by email or SMS or whatever.
The Readmodel should also have a status panel, with all kinds of metrics about the events that it has processed, if there are gaps, if there are errors or warnings; it also should have a command panel where an Admin could rebuild it at any time, preferable without a system downtime.
In any approach, the Readmodels should be easily rebuildable.
How would you choose between a pull approach and a push approach? Would you use a message queue with a push (events)
I prefer the pull based approach because:
it does not use another stateful component like a message queue, another thing that must be managed, that consume resources and that can (so it will) fail
every Readmodel consumes the events at the rate it wants
every Readmodel can easily change at any moment what event types it consumes
every Readmodel can easily at any time be rebuild by requesting all the events from the beginning
there order of events is exactly the same as the source of truth because you pull from the source of truth
There are cases when I would choose a message queue:
you need the events to be available even if the Event store is not
you need competitive/paralel consumers
you don't want to track what messages you consume; as they are consumed they are removed automatically from the queue

This talk from Greg Young may help.
How does the application know to replay the 950 events that are missing at the end of the day or the x events that were missed because of the downtime ten minutes ago?
So there are two different approaches here.
One is perhaps simpler than you expect - each time you need to rebuild a read model, just start from event 0 in the stream.
Yeah, the scale on that will eventually suck, so you won't want that to be your first strategy. But notice that it does work.
For updates with not-so-embarassing scaling properties, the usual idea is that the read model tracks meta data about stream position used to construct the previous model. Thus, the query from the read model becomes "What has happened since event #1,999,050"?
In the case of event store, the call might look something like
EventStore.ReadStreamEventsForwardAsync(stream, 1999050, 100, false)

Application doesn't know it hasn't processed some events due to a bug.
First of all, I don't understand why you assume that the number of events written on the write side must equal number of events processed by read side. Some projections may subscribe to the same event and some events may have no subscriptions on the read side.
In case of a bug in projection / infrastructure that resulted in a certain projection being invalid you might need to rebuild this projection. In most cases this would be a manual intervention that would reset the checkpoint of projection to 0 (begining of time) so the projection will pick up all events from event store from scratch and reprocess all of them again.

The event store should have a global sequence number across all events starting, say, at 1.
Each projection has a position tracking where it is along the sequence number. The projections are like logical queues.
You can clear a projection's data and reset the position back to 0 and it should be rebuilt.
In your case the projection fails for some reason, like the server going offline, at position 1,999,050 but when the server starts up again it will continue from this point.

How to avoid concurrency on aggregates status using Rebus in a server cluster

I have a web service that use Rebus as Service Bus.
Rebus is configured as explained in this post.
The web service is load balanced with a two servers cluster.
These services are for a production environment and each production machine sends commands to save the produced quantities and/or to update its state.
In the BL I've modelled an Aggregate Root for each machine and it executes the commands emitted by the real machine. To preserve the correct status, the Aggregate needs to receive the commands in the same sequence as they were emitted, and, since there is no concurrency for that machine, that is the same order they are saved on the bus.
E.G.: the machine XX sends a command of 'add new piece done' and then the command 'Set stop for maintenance'. Executing these commands in a sequence you should have Aggregate XX in state 'Stop', but, with multiple server/worker roles, you could have that both commands are executed at the same time on the same version of Aggregate. This means that, depending on who saves the aggregate first, I can have Aggregate XX with state 'Stop' or 'Producing pieces' ... that is not the same thing.
I've introduced a Service Bus to add scale out as the number of machine scales and resilience (if a server fails I have only slowdown in processing commands).
Actually I'm using the name of the aggregate like a "topic" or "destinationAddress" with the IAdvancedApi, so the name of the aggregate is saved into the recipient of the transport. Then I've created a custom Transport class that:
1. does not remove the messages in progress but sets them in state
InProgress.
2. to retrive the messages selects only those that are in a recipient that have no one InProgress.
I'm wandering: is this the best way to guarantee that the bus executes the commands for aggregate in the same sequence as they arrived?

The solution would be have some kind of locking of your aggregate root, which needs to happen at the data store level.
E.g. by using optimistic locking (probably implemented with some kind of revision number or something like that), you would be sure that you would never accidentally overwrite another node's edits.
This would allow for your aggregate to either
a) accept the changes in either order (which is generally preferable – makes your system more tolerant), or
b) reject an invalid change
If the aggregate rejects the change, this could be implemented by throwing an exception. And then, in the Rebus handler that catches this exception, you can e.g. await bus.Defer(TimeSpan.FromSeconds(5), theMessage) which will cause it to be delivered again in five seconds.

You should never rely on message order in a service bus / queuing / messaging environment.
When you do find yourself in this position you may need to re-think your design. Firstly, a service bus is most certainly not an event store and attempting to use it like one is going to lead to pain and suffering :) --- not that you are attempting this but I thought I'd throw it in there.
As for your design, in order to manage this kind of state you may want to look at a process manager. If you are not generating those commands then even this will not help.
However, given your scenario it seems as though the calls are sequential but perhaps it is just your example. In any event, as mookid8000 said, you either want to:
discard invalid changes (with the appropriate feedback),
allow any order of messages as long as they are valid,
ignore out-of-sequence messages till later.
Hope that helps...

"exactly the same sequence as they were saved on the bus"
Just... why?
Would you rely on your HTTP server logs to know which command actually reached an aggregate first? No because it is totally unreliable, just like it is with at-least-one delivery guarantees and it's also irrelevant.
It is your event store and/or normal persistence state that should be the source of truth when it comes to knowing the sequence of events. The order of commands shouldn't really matter.
Assuming optimistic concurrency, if the aggregate is not allowed to transition from A to C then it should guard this invariant and when a TransitionToStateC command will hit it in the A state it will simply get rejected.
If on the other hand, A->C->B transitions are valid and that is the order received by your aggregate well that is what happened from the domain perspective. It really shouldn't matter which command was published first on the bus, just like it doesn't matter which user executed the command first from the UI.
"In my scenario the calls for a specific aggregate are absolutely
sequential and I must guarantee that are executed in the same order"
Why are you executing them asynchronously and potentially concurrently by publishing on a bus then? What you are basically saying is that calls are sequential and cannot be processed concurrently. That means everything should be synchronous because there is no potential benefit from parallelism.
Why:
executeAsync(command1)
executeAsync(command2)
executeAsync(command3)
When you want:
execute(command1)
execute(command2)
execute(command3)
You should have a single command message and the handler of this message executes multiple commands against the aggregate. Then again, in this case I'd just create a single operation on the aggregate that performs all the transitions.

node.js + mongo + atomic update of multiple entities = head ache

My setup:
Node.js
Mongojs
A simple database containing two collections - inventory and invoices.
Users may concurrently create invoices.
An invoice may refer to several inventory items.
My problem:
Keeping the inventory integrity. Imagine a scenario were two users submit two invoices with overlapping item sets.
A naive (and wrong) implementation would do the following:
For each item in the invoice read the respective item from the inventory collection.
Fix the quantity of the inventory items.
If any item quantity goes below zero - abandon the request with the relevant message to the user.
Save the inventory items.
Save the invoice.
Obviously, this implementation is bad, because the actions of the two users are going to interleave and affect each other. In a typical blocking server + relational database this is solved with complex locking/transaction schemes.
What is the nodish + mongoish way to solve this? Are there any tools that the node.js platform provides for these kind of things?

You can look at a two phase commit approach with MongoDB, or you can forget about transactions entirely and decouple your processes via a service bus approach. Use Amazon as an example - they will allow you to submit your order, but they will not confirm it until they have been able to secure your inventory item, charged your card, etc. None of this occurs in a single transaction - it is a series of steps that can occur in isolation and can have compensating steps applied where necessary.
A naive bus implementation would do the following (keep in mind that this is just a generic suggestion for you to work from and the exact implementation would depend on your specific needs for concurrency, etc.):
place the order on the queue. At this point, you can
continue to have your client wait, or you can thank them for their
order and let them know they will receive an email when its been
processed.
an "inventory worker" will grab the order and lock the inventory
items that it needs to reserve. This can be done in many different
ways. With Mongo you could create a collection that has a document per orderid. This document would have as its ID the inventory item ID and a TTL that is reasonable
(say 30 seconds). As long as the worker has the lock, then it can
manage the inventory levels of the items it has locks for. Once its
made its changes, it could delete the "lock" document.
If another worker comes along that wants to manage the same item
while its locked, you could put the blocked worker into sleep mode
for X seconds and then retry or, better yet, you could put the
request back onto the message bus to be picked up later by another
worker.
Once the worker has resolved all the inventory items, it then can
place another message on the service bus that indicates a card
should be charged, or processing should receive a notification to
pull the inventory, or an email can be sent to the person who made
the order, etc., etc.
Sounds complex, but once you have a message bus setup, its actually relatively simple. A list of Node Message Bus Implementations can be found here.
Some developers will even skip the formal message bus completely and use a database as their message passing engine which can work in simple implementations. Google Mongo and Queues.
If you don't expect more than 1 server and the message bus implementation is too bulky, node could handle the locking and message passing for you. For example, if you really wanted to lock with node, you could create an array that stored the inventory item IDs. Although, to be frank, I think the message bus is the best way to go. Anyway, here's some code I have used in the past to handle simple external resource locking with Node.
// attempt to take out a lock, if the lock exists, then place the callback into the array.
this.getLock = function( id, cb ) {
if(locks[id] ) {
locks[id].push( cb );
return false;
}
else {
locks[id] = [];
return true;
}
};
// call freelock when done
this.freeLock = function( that, id ) {
async.forEach(locks[id], function(item, callback) {
item.apply( that,[id]);
callback();
}, function(err){
if(err) {
// do something on error
}
locks[id] = null;
});
};

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string