CQRS Aggregate and Projection consistency - domain-driven-design

Aggregate can use View this fact is described in Vaughn Vernon's book:
Such Read Model Projections are frequently used to expose information to various clients (such as desktop and Web user interfaces), but they are also quite useful for sharing information between Bounded Contexts and their Aggregates. Consider the scenario where an Invoice Aggregate needs some Customer information (for example, name, billing address, and tax ID) in order to calculate and prepare a proper Invoice. We can capture this information in an easy-to-consume form via CustomerBillingProjection, which will create and maintain an exclusive instance of CustomerBilling-View. This Read Model is available to the Invoice Aggregate through the Domain Service named IProvideCustomerBillingInformation. Under the covers this Domain Service just queries the document store for the appropriate instance of the CustomerBillingView
Let's imagine our application should allow to create many users, but with unique names. Commands/Events flow:
CreateUser{Alice} command sent
UserAggregate checks UsersListView, since there are no users with name Alice, aggregate decides to create user and publish event.
UserCreated{Alice} event published // By UserAggregate
UsersListProjection processed UserCreated{Alice} // for simplicity let's think UsersListProjection just accumulates users names if receives UserCreated event.
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UserCreated{Bob} event published // By UserAggregate
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UsersListProjection processed UserCreated{Bob} .
UsersListProjection processed UserCreated{Bob} .
The problem is - UsersListProjection did not have time to process event and contains irrelevant data, aggregate used this irrelevant data. As result - 2 users with the same name created.
how to avoid such situations?
how to make aggregates and projections consistent?

how to make aggregates and projections consistent?
In the common case, we don't. Projections are consistent with the aggregate at some time in the past, but do not necessarily have all of the latest updates. That's part of the point: we give up "immediate consistency" in exchange for other (higher leverage) benefits.
The duplication that you refer to is usually solved a different way: by using conditional writes to the book of record.
In your example, we would normally design the system so that the second attempt to write Bob to our data store would fail because conflict. Also, we prevent duplicates from propagating by ensuring that the write to the data store happens-before any events are made visible.
What this gives us, in effect, is a "first writer wins" write strategy. The writer that loses the data race has to retry/fail/etc.
(As a rule, this depends on the idea that both attempts to create Bob write that information to the same place, using the same locks.)
A common design to reduce the probability of conflict is to NOT use the "read model" of the aggregate itself, but to instead use its own data in the data store. That doesn't necessarily eliminate all data races, but you reduce the width of the window.
Finally, we fall back on Memories, Guesses and Apologies.

It's important to remember in CQRS that every write model is also a read model for the reads that are required to validate a command. Those reads are:
checking for the existence of an aggregate with a particular ID
loading the latest version of an entire aggregate
In general a CQRS/ES implementation will provide that read model for you. The particulars of how that's implemented will depend on the implementation.
Those are the only reads a command-handler ever needs to perform, and if a query can be answered with no more than those reads, the query can be expressed as a command (e.g. GetUserByName{Alice}) which when handled does not emit events. The benefit of such read-only commands is that they can be strongly consistent because they are limited to a single aggregate. Not all queries, of course, can be expressed this way, and if the query can tolerate eventual consistency, it may not be worth paying the coordination tax for strong consistency that you typically pay by making it a read-only command. (Command handling limited to a single aggregate is generally strongly consistent, but there are cases, e.g. when the events form a CRDT and an aggregate can live in multiple datacenters where even that consistency is loosened).
So with that in mind:
CreateUser{Alice} received
user Alice does not exist
persist UserCreated{Alice}
CreateUser{Alice} acknowledged (e.g. HTTP 200, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Alice}
CreateUser{Bob} received
user Bob does not exist
persist UserCreated{Bob}
CreateUser{Bob} acknowledged
CreateUser{Bob} received
user Bob already exists
command-handler for an existing user rejects the command and persists no events (it may log that an attempt to create a duplicate user was made)
CreateUser{Bob} ack'd with failure (e.g. HTTP 401, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Bob}
Note that while the UserListProjection can answer the question "does this user exist?", the fact that the write-side can also (and more consistently) answer that question does not in and of itself make that projection superfluous. UserListProjection can also answer questions like "who are all of the users?" or "which users have two consecutive vowels in their name?" which the write-side cannot answer.

Related

DDD Relate Aggregates in a long process running

I am working on a project in which we define two aggregates: "Project" and "Task". The Project, in addition to other attributes, has the points attribute. These points are distributed to the tasks as they are defined by users. In a use case, the user assigns points for some task, but the project must have these points available.
We currently model this as follows:
“task.RequestPoints(points)“, this method will create an aggregate PointsAssignment with attributes points and taskId, which in its constructor issues a PointsAssignmentRequested domain event.
The handler of the event issued will fetch the project related to the task and the aggregate PointsAssigment and call the method “project.assignPoints(pointsAssigment, service)”, that is, it will pass PointAssignment aggregate as a parameter and a service to calculate the difference between the current points of the task and the desired points.
If points are available, the project will modify its points attribute and issue a “ProjectPointsAssigned” domain event that will contain the pointsAssignmentId attribute (in addition to others)
The handler of this last event will fetch the PointsAssingment and confirm “pointsAssigment.Confirm ()”, this aggregate will issue a PointsAssigmentConfirmed domain event
The handler for this last event will bring up the associated task and call “task.AssignPoints (pointsAssignment.points)”
My question is: is it correct to pass in step 2 the aggregate PointsAssignment in the project method? That was the only way I found to be able to relate the aggregates.
Note: We have created the PointsAssignment aggregate so that in case of failure I could save the error “pointsAssignment.Reject(reasonText)” and display it to the user, since I am using eventual consistency (1 aggregate per transaction).
We think about use a Process Manager (PointsAssingmentProcess), but the same way we need the third aggregate PointsAssingment to correlate this process.
I would do it a little bit differently (it doesn´t mean more correct).
Your project doesn´t need to know anything about the PointsAssignment.
If your project is the one that has the available points for use, it can have simple methods of removing or adding points.
RemovePointsCommand -> project->removePoints(points)
AddPointsCommand -> project->addPoints(points)
Then, you would have an eventHandler that would react to the PointsAssignmentRequested (i imagine this guy has the id of the project and the number of points and maybe a status field from what you said)
This eventHandler would only do:
on(PointsAssignmentRequested) -> dispatch command (RemovePointsCommand)
// Note that, in here it would be wise to the client to send an ID for this operation, so it can do it asynchronously.
That command can either success or fail, and both of them can dispatch events:
RemovePointsSucceeded
RemovePointsFailed
// Remember that you have a correlation id from earlier persisted
Then, you would have a final eventHandler that would do:
on(RemovePointsSucceeded) -> PointsAssignment.succeed() //
Dispatches PointsAssignmentSuceeded
on(PointsAssignmentSuceeded) -> task.AssignPoints
(pointsAssignment.points)
On the fail side
on(RemovePointsFailed) -> PointsAssignment.fail() // Dispatches PointsAssignmentFailed
This way you dont have to mix aggregates together, all they know are each others id´s and they can work without knowing anything about the schema of other aggregates, avoiding undesired coupling.
I see the semantics of the this problem exactly as a bank transfer.
You have the bank account (project)
You have money in this bank account(points)
You are transferring money through a transfer process (pointsAssignment)
You are transferring money to an account (task)
The bank account only should have minimal operations, of withdrawing and depositing, it does not need to know anything about the transfer process.
The transfer process need to know from which bank it is withdrawing from and to which account it is depositing to.
I imagine your PointsAssignment being like
{
"projectId":"X",
"taskId":"Y",
"points" : 10,
"status" : ["issued", "succeeded", "failed"]
}

How to process Read Model in CQRS

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.
If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.
For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

How are consistency violations handled in event sourcing?

First of all, let me state that I am new to Command Query Responsibility Segregation and Event Sourcing (Message-Drive Architecture), but I'm already seeing some significant design benefits. However, there are still a few issues on which I'm unclear.
Say I have a Customer class (an aggregate root) that contains a property called postalAddress (an instance of the Address class, which is a value object). I also have an Order class (another aggregate root) that contains (among OrderItem objects and other things) a property called deliveryAddress (also an instance of the Address class) and a string property called status.
The customer places an order by issueing a PlaceOrder command, which triggers the OrderReceived event. At this point in time, the status of the order is "RECEIVED". When the order is shipped, someone in the warehouse issues an ShipOrder command, which triggers the OrderShipped event. At this point in time, the status of the order is "SHIPPED".
One of the business rules is that if a Customer updates their postalAddress before an order is shipped (i.e., while the status is still "RECEIVED"), the deliveryAddress of the Order object should also be updated. If the status of the Order were already "SHIPPED", the deliveryAddress would not be updated.
Question 1. Is the best place to put this "conditionally cascading address update" in a Saga (a.k.a., Process Manager)? I assume so, given that it is translating an event ("The customer just updated their postal address...") to a command ("... so update the delivery address of order 123").
Question 2. If a Saga is the right tool for the job, how does it identify the orders that belong to the user, given that an aggregate can only be retrieved by it's unique ID (in my case a UUID)?
Continuing on, given that each aggregate represents a transactional boundary, if the system were to crash after the Customer's postalAddress was updated (the CustomerAddressUpdated event being persisted to the event store) but before the OrderDeliveryAddressUpdated could be updated (i.e., between the two transactions), then the system is left in an inconsistent state.
Question 3. How are such "violations" of consistency rules detected and rectified?
In most instances the delivery address of an order should be independent of any other data change as a customer may want he order sent to an arbitrary address. That being said, I'll give my 2c on how you could approach this:
Is the best place to handle this in a process manager?
Yes. You should have an OrderProcess.
How would one get hold of the correct OrderProcess instance given that it can only be retrieve by aggregate id?
There is nothing preventing one from adding any additional lookup mechanism that associates data to an aggregate id. In my experimental, going-live-soon, mechanism called shuttle-recall I have a IKeyStore mechanism that associates any arbitrary key to an AR Id. So you would be able to associate something like [order-process]:customerId=CID-123; as a key to some aggregate.
How are such "violations" of consistency rules detected and rectified?
In most cases they could be handled out-of-band, if possible. Should I order something from Amazon and I attempt to change my address after the order has shipped the order is still going to the original address. If your case of linking the customer postal address to the active order address you could notify the customer that n number of orders have had their addresses updated but that a recent order (within some tolerance) has not.
As for the system going down before processing you should have some guaranteed delivery mechanism to handle this. I do not regard these domain event in the same way I regard system events in a messaging infrastructure such as a service bus.
Just some thoughts :)

Why limit commands and events to one aggregate? CQRS + ES + DDD

Please explain why modifying many aggregates at the same time is a bad idea when doing CQRS, ES and DDD. Is there any situations where it still could be ok?
Take for example a command such as PurgeAllCompletedTodos. I want this command to lead to one event that update the state of each completed Todo-aggregate by setting IsActive to false.
Why is this not good?
One reason I could think of:
When updating the domain state it's probably good to limit the transaction to a well defined part of the entire state so that only this part need to be write locked during the update. Doing so would allow many writes on different aggregates in parallell which could boost performance in some extremely heavy scenarios.
The response of the question lie in the meaning of "aggregate".
As first thing I would say that you are not modifying 'n' aggregates, but you are modifying 'n' entities.
An aggregate contains more-than-one entity and it is just a transaction concept, the aggregate (pattern) is used when you need to modify the state of more than one entity in your application transactionally (all are modified or none).
Now, why you would modify more than one aggregate with one command?
If you feel this needs, before doing anything else check your aggregate boundaries to see if you can modify it to remove the needs to 1 command -> 'n' aggregate.
An aggregate can contains a lot of entities of the same type, so for your command PurgeAllCompletedTodos, you could also think about expand the transaction boundary from a single Todo to an aggregate UserTodosAggregate that contains all the user todos, and let it manage all the commands for the todos of a single user.
In this way you can modify all the todos of a user in a single transaction.
If this still doesn't solve your problem because, let's say that is needed to purge all completed todos of each user in the application, you will still need to send a command to 'n' aggregates, the aggregate boundary doesn't help, so we can think of having an AllApplicationTodosAggregate that manage the command.
Probably this isn't the best solution, because as you said it that command would block ALL the todos of the application, but, always check if it can be a good trade off (this part of the blocking is explained very well in both Blue Book and Red Book of DDD).
What if I need to modify some entities and can't have them in a single aggregate?
With the previous said, a command that modify more than one aggregate is bad because of transactions. What if you modify 3 aggregate, the first is good, and then the server is shut down?
In this case what you are doing is having a lot of single modification that needs to be managed to prevent inconsistency of the system.
It can be done using a process manager, whom responsabilities are modify all the aggregates sending them the right command and manage failures if they happen.
An aggregate still receive it's own command, but the process manager is in charge to send them in a way it knows (one at time, all in parallel, 5 per time, what-do-you-want)
So you can have a strategy to manage the failure between two transaction, and make decision like: "if something fail, roll back all the modification done untill now" (sending a rollback command to each aggregate), or "if an operation fail repeat it 3 times each 30 minutes and if doens't work then rollback", "if something fail create a notification for the system admin".
(sorry for the long post, at least hope it helps)

Domain Driven Design - Atomic transaction across multiple bounded context

In DDD, I understand that Events can decouple the Bounded Contexts when they communicate with each others. Assume an atomic transaction contains two database operations on seperated bounded contexts A and B. When operation on A finishes it sends and event which is handled by B which finishes second operation. However, how does operation on A rolls back if operation on B failed?
For example, I am currently designing a system using Domain Driven Design. It contains a Membership and an Inventory bounded contexts. In order to decouple the contexts, I use Events: when an order is being paid, Inventory context will reduce the quantity of the sold product, and send an Product_Sold event. the event is then handled by Membership context which then substracts the user's balance based on the price of the sold product.
However if the user balance update failed due to database failure, how does Inventory context know it so that it can roll back the previously reduced product quantity?
There's actually a pattern for this called Saga.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
http://nservicebus.com/Sagas.aspx
As you use events to communicate between contexts, simply publish the Product_NotSold and roll back the transaction when you get this event.
However, you cannot provide 'atomic' transaction in this way. It more a long running process (a.k.a. saga). If you really want atomicity, you need to use two-phase commit and abandon events.

Resources