Keeping consistency between aggregates - domain-driven-design

I am wondering how to solve transactional consistency problems between aggregates. My first impression is that whenever you need transactional consistency between aggregates, you wrongly designed your aggregates. However, I would still like to ask this question to make sure I am not missing anything.
Imagine you sell cow milk. You have multiple cows, each producing a certain amount of liters milk a day. You need to have a service that is able to get the amount of milk in stock. Besides that you should also be able to order milk. Based on this information you can create three aggregates, Cow, Stock and Order. Whenever a certain amount of milk is ordered, one of the business rules is to check if that amount is in stock and if not, let the user know right away. How can this be achieved when two users do a concurrent request and order a total amount of 150 liters milk, while only a 130 liters are available? My first idea is that you can achieve this by optimistic/pessimistic locking, but in that case one aggregate depends on the other. Is there a certain way to solve this particular issue, or is this just bad aggregate design?

My first impression is that whenever you need transactional consistency between aggregates, you wrongly designed your aggregates.
That's exactly right; the way that you discover aggregate boundaries is first by identifying values that need to by kept consistent, and then choose boundaries that have the property that any two values that need to be consistent with each other are within the same boundary.
Note: we don't always get this right; perhaps the requirements were wrong, perhaps our model wasn't adequate, perhaps the business changed. Part of the challenge is to make sure that it is always easy to replace the current model with a better one.
Based on this information you can create three aggregates, Cow, Stock and Order.
Note: Cow is a lousy aggregate -- assuming we are talking about real out in the world eating grass cows. It's an entity, yes, but it is outside the influence of the model. If the model says that the cow is empty, and the cow says that it is full of milk, the cow is right.
Similarly, if the cows are real, then most of your stock is real as well. If the model says there are seven full canisters of milk, and the farmer counts six, then six is the right answer.
Aggregates are information resources.
Whenever a certain amount of milk is ordered, one of the business rules is to check if that amount is in stock and if not, let the user know right away. How can this be achieved when two users do a concurrent request and order a total amount of 150 liters milk, while only a 130 liters are available?
An important thing to realize about "right away"; you are here, the user (buyer?) is there. There's a certain amount of latency in the communication, which means that the information you are sending to the buyer is already stale when it arrives. (Technically, it's already stale at the point when you send it.)
Second fulfilling the orders requires understanding both orders and available stock. So you might model this as a single aggregate that does everything, or you might model it as order aggregates that communicate asynchronously with some fulfillment aggregate; for instance, it might be that what Stock really does is match up notifications of available milk with pending orders.
In your concurrent processing example, this would look like two commands to reserve 130 liters of milk running concurrently against a stock aggregate with 150 liters of milk available. Using optimistic concurrency both commands would discover that there is enough milk to satify the order, and would try to update the book of record. That update, however, is serialized -- think transaction, or compare and swap -- so one of the commands succeeds, and the other gets a concurrent modification exception instead. This second command therefore tries again, reloading the stock in its new state. This time through, it discovers that the available stock is insufficient, and acts accordingly (moving the order into the waiting list, advising the buyer of the expected fulfillment date, etc).
Note that you would also get concurrent modification exceptions when a command to reserve milk is run in parallel to an announcement that more milk is available.

My first impression is that whenever you need transactional consistency between aggregates, you wrongly designed your aggregates.
I'll go the other direction here and say: not necessarily. You are always going to require consistency between aggregates at some point in time. The only question is when. You may find that you run into some politics in some situations. I have had to implement 100%, immediate consistency, between a whole bunch of aggregates simply because I was instructed to do so by those holding the purse strings. In the ideal world we'd be free to choose. But in my real world you can bet we're going to lose. Anyway, lyrics aside, it certainly is possible to get away from immediate consistency.
Altering more than one aggregate in a transaction isn't the end of the world but if you can avoid you certainly should. It does come with some slight overhead as can be expected.
In your example you can check the stock levels but, as noted, concurrent reads lead to misleading information. However, you could place an order and before you even kick off your process you could "reserve" the quantity using what will become your process manager's Id and your overall "correlation id". if you cannot reserve the stock you could abandon the order. However, you'd have to reserve multiple items for certain orders with more than one item.
There are quite a number of approaches to this (airline reservation can also be used as an example). You may want to take the order anyway and see what you can do. Order may have different turnaround times and more milk, or other items, may become available in the meantime. You could notify the client that their order could not be filled and has been cancelled or inform them to click a link to adjust the order or perhaps accept an alternative.
Just some thoughts :)

Related

How to ensure data consistency between two different aggregates in an event-driven architecture?

I will try to keep this as generic as possible using the “order” and “product” example, to try and help others that come across this question.
The Structure:
In the application we have 3 different services, 2 services that follow the event sourcing pattern and one that is designed for read only having the separation between our read and write views:
- Order service (write)
- Product service (write)
- Order details service (Read)
The Background:
We are currently storing the relationship between the order and product in only one of the write services, for example within order we have a property called ‘productItems’ which contains a list of the aggregate Ids from Product for the products that have been added to the order. Each product added to an order is emitted onto Kafka where the read service will update the view and form the relationships between the data.
 
The Problem:
As we pull back by aggregate Id for the order and the product to update them, if a product was to be deleted, there is no way to disassociate the product from the order on the write side.
 
This in turn means we have inconsistency, that the order holds a reference to a product that no longer exists within the product service.
The Ideas:
Master the relationship on both sides, which means when the product is deleted, we can look at the associated orders and trigger an update to remove from each order (this would cause duplication of reference).
Create another view of the data that shows the relationships and use a saga to do a clean-up. When a delete is triggered, it will look up the view database, see the relationships within the data and then trigger an update for each of the orders that have the product associated.
Does it really matter having the inconsistencies if the Product details service shows the correct information? Because the view database will consume the product deleted event, it will be able to safely remove the relationship that means clients will be able to get the correct view of the data even if the write models appear inconsistent. Based on the order of the events, the state will always appear correct in the read view.
Another thought: as the aggregate Id is deleted, it should never be reused which means when we have checks on the aggregate such as: “is this product in the order already?” will never trigger as the aggregate Id will never be repurposed meaning the inconsistency should not cause an issue when running commands in the future.
Sorry for the long read, but these are all the ideas we have thought of so far, and I am keen to gain some insight from the community, to make sure we are on the right track or if there is another approach to consider.
 
Thank you in advance for your help.
Event sourcing suites very well human and specifically human-paced processes. It helps a lot to imagine that every event in an event-sourced system is delivered by some clerk printed on a sheet of paper. Than it will be much easier to figure out the suitable solution.
What's the purpose of an order? So that your back-office personnel would secure the necessary units at a warehouse, then customer would do a payment and you start shipping process.
So, I guess, after an order is placed, some back-office system can process it and confirm that it can be taken into work and invoicing. Or it can return the order with remarks that this and that line are no longer available, so that a customer could agree to the reduced order or pick other options.
Another option is, since the probability of a customer ordering a discontinued item is low, just not do this check. But if at the shipping it still occurs - then issue a refund and some coupon for inconvenience. Why is it low? Because the goods are added from an online catalogue, which reflects the current state. The availability check can be done on the 'Submit' button click. So, an inconsistency may occur if an item is discontinued the same minute (or second) the order has been submitted. And usually the actual decision to discontinue is made up well before the information was updated in the Product service due to some external reasons.
Hence, I suggest to use eventual consistency. Since an event-sourced entity should only be responsible for its own consistency and not try to fulfil someone else's responsibility.

Modeling one-to-many relations using Domain Driven Design

This question is more of a general question about how to model simple one-to-many relations using collections: should a change in a list item be reflected in the version of the aggregate containing it?
The domain is about meeting scheduling (like in Outlook).
I have a Meeting entity, which can have multiple Participants.
A participant can accept/decline meeting requests.
Rescheduling a meeting nullifies all of the participants confirmations.
I thought of two ways to model this.
Option 1
The Meeting aggregate will contain a list of Participants where each Participant has a ParticipantId and a Status (accepted/denied).
The problem here is that every Accept or Deny command, for a specific participant, increments the Meeting's version, which means two participants will enter a race condition if trying to Accept the meeting request based on the same original version.
Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled? In this case we can't afford to ignore the Meeting's version, because this time the version DOES represent a real version that should be considered.
BTW, is it at all a good practice to ignore the version in some of the commands and not in others?
Option 2
Extract a Participation aggregate out of Meeting.
Participation will have MeetingId, ParticipantId, and Status.
It will also have its own version.
This way, when participant X Accepts the meeting request, only the relevant Participation will be modified, and the rest will be left intact.
And, when rescheduling the meeting, a "Meeting Rescheduled" event will be published and an event handler will respond to it by resetting all of the Participations' statuses to "NotAccepted" regardless of their current version.
On the one hand this sounds logical in the sense that a meeting's version shouldn't be incremented just because someone accepted/denied its request.
On the other hand, modeling Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
Anyway, would love to get feedback on this and see the various approaches to this problem.
Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
This looks like a modeling error. You should keep in mind that the meeting aggregate is not the book of record for the participants availability - the real world is. So the message shouldn't be AcceptInvitation, but instead InvitationAccepted. There shouldn't be a conflict about this, because the domain model doesn't get to veto events outside of its authority boundary.
You might, depending on your implementation, end up with a concurrent modification exception in your plumbing, but that's something that you should be handling automatically (ie: expected version any, or a retry).
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled?
The solution here is to model more carefully. Yes, sometimes you will get a message that accepts or declines an invitation that has expired.
Put another way: race conditions don't exist.
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
What happens to Alice, who replied instantly to the invitation, when the meeting is rescheduled? Why wouldn't the same thing happen to Bob, when his reply arrives just after the meeting is rescheduled?
Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
I find that heuristic isn't particularly effective. It's much more important to understand whether entities can change state independently, or if their changes need to be coordinated.
Actually, the Meeting aggregate is used to track the participants availability. That's what it purpose is. Unless I didn't fully understand you...
It's a bit subtle, and I didn't spell it out very well.
Suppose the model says that I'm available, but an emergency in the real world calls me away. What happens? Am I blocked from going to the hospital because the model says I have to go to a meeting? Can somebody cancel my emergency by changing the invitation I've submitted?
Furthermore, if I'm away on an emergency, are you available for a meeting that is scheduled for the same time as the meeting you and I were going to have?
In this space, the real world is the authority for whether or not somebody is available. The model is just looking at a cached copy of a message describing whether or not somebody was available in the past.
The cached information being used by the model is not guaranteed to be complete. See Greg Young on warehouse systems and exception reports.
which makes me think that perhaps the Meeting aggregate should have two version fields: one will be a strong version which, when incremented, represents a breaking change, and another soft version for non-breaking changes. Does this make any sense?
Not really. Version is not, as far as I know, a term taken from the ubiquitous language of scheduling meetings. It's meta data, if it exists at all, and the business rules in your model should not depend upon meta data.
I agree, but a Meeting ID (or any ID for that matter) is also not part of the ubiquitous language, yet I might pass it back and forth between my domain world and external worlds.

How to resolve Order and Warehouse bounded contexts dependency?

I am working on DDD project and I am currently focused on two bouned contexts, Orders and Warehouse.
What confuses me is the following situation:
Order keep track of all the placed orders, and warehouse keeps track about all the available inventory. If user places one order for certain product item, that would mean one less item of that product in a warehouse. I am oversimplifying this process, so please bear with me.
Since two domain models are placed inside of a different BC, i am currently relying on eventual consistency ie. after one item has been sold, it would eventually be removed from the warehouse.
That situation unfortunately leads to the problem scenario where another user could simultaneously make another order of the same item, and it would appear as available until eventual consistency kicks is. That is something it is unacceptable by the domain expert.
So IMO I am stuck with two options
merge order and warehouse (at least the part regarding product
inventory, units available in warehouse) into one BC
have Order BC (or microservice if you wish) to be dependent of Warehouse BC (microservice) in order to pull a live product units
available
Which option does actually follows DDD patern? Is there another way out?
You could use a reservation system with a timeout.
Using a messaging analogy: With a broker-style queuing mechanism (such as RabbitMQ) you get a message from the queue and you have control over it until you either acknowledge that it can be removed from the queue or you release it back to the queue.
You could do the same thing in your ordering process. You reserve any items on your order. SO when you add them they have a status of, say, reserving and upon sending some message to reserve the items. If the response comes back you can decide how to proceed. Perhaps you could add any items that cannot be reserved onto a back order or try again later.
There are going to be different ways to approach this. Depending on your business case it may be acceptable to only check availability when someone really accepts the order.
If you domain expert reckons it is totally unacceptable that having this resolved at the end of the process then you could move it to the start. The issue is of course that user A could reserve and never buy thereby losing user B as a customer; whereas only applying the real "taking" of the item at the end of the process is closer to ensuring a purchase. But that is a business decision.
This issue is a really great example of where reality actually is eventually consistent. Is it really the best thing to decline an order if there is no inventory currently in the warehouse - even if there was a replenishment due in the next 20 minutes?
What if the item was actually on the shelf, but the operator hadn't yet keyed it into the system?
Sometimes designers and domain experts assume that people want 100% consistency, when really, users might be willing to accept a delay in confirmation of their order, if it increased the chance that their order would be accepted rather than rejected.
In the case above, why make it the user's job to retry their order N minutes later? In an eventually consistent system, you can accommodate such timing flexibility by including a timeout to retry the attempt to fulfill the order for a period of time before confirming to the client that it really wasn't possible.
There are technical solutions that will give you 100% consistency, but I think really this is not a technical challenge but a cultural/mindset one, changing people's minds about what is possible & acceptable to achieve an what is actually a better outcome.
IMO you can build a PlaceOrderSaga which will ask for inventory availability before placing the order.

Aggregate Root including tremendous number of children

I wonder how to model Calendar using DDD and CQRS. My problem consist in increasing number of events. I consider Calendar as Aggregate Root which contains Events (Calendar Events). I dont want to use ReadSide in my Commands but I need way to check events collisions at domain level.
I wonder how to model Calendar using DDD and CQRS. My problem consist in increasing number of events.
The most common answer to "long lived" aggregates is to break that lifetime into episodes. An example of this would be the temporary accounts that an accountant will close at the end of the fiscal year.
In your specific case, probably not "the Calendar" so much as "the February calendar", the "the March calendar", and so on, at whatever grain is appropriate in your domain.
Im not sure if Im right about DDD aproach in terms of validation. I believe the point is not to allow the model to enter into invalid state
Yes, but invalid state is a tricky thing to define. Udi Dahan offered this observation
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
More succinctly, processing command A followed by processing command B produces a valid state, then it should also be true that you end up processing command B first, and then A.
Let's choose your "event collisions" example. Suppose we handle two commands scheduleMeeting(A) and scheduleMeeting(B), and the domain model understands that A and B collide. Riddle: how do we make sure the calendar stays in a valid state?
Without loss of generality, we can flip a coin to decide which command arrives first. My coin came up tails, so command B arrives first.
on scheduleMeeting(B):
publish MeetingScheduled(B)
Now the command for meeting A arrives. If your valid calendars do not permit conflicts, then your implementation needs to look something like
on scheduleMeeting(A):
throw DomainException(A conflicts with B)
On the other hand, if you embrace the idea that the commands arrive shouldn't influence the outcome, then you need to consider another approach. Perhaps
on scheduleMeeting(A)
publish MeetingScheduled(A)
publish ConflictDetected(A,B)
That is, the Calendar aggregate is modeled to track not only the scheduled events, but also the conflicts that have arisen.
See also: aggregates and RFC 2119
Event could also an be an Aggregate root. I don't know your business constraint but I think that if two Events colide you could notify the user somehow to take manual actions. Otherwise, if you really really need them not to colide you could use snapshots to speed up the enormous Calendar AR.
I dont want to use ReadSide in my Commands but I need way to check events collisions at domain level.
You cannot query the read model inside the aggregate command handler. For the colision detection I whould create a special DetectColisionSaga that subscribes to the EventScheduled event and that whould check (possible async if are many Events) if a colision had occurred and notify the user somehow.

What would the actor be in a use case where stock is reordered automatically when it hits a certain point

Say there is 5 bags of potatoes, when it hits two bags, an order is sent to the supplier automatically. Who/What is the actor?
Clearly you have an actor to watch the queue like this:
Even if the Potato Watcher is something implemented inside the SUC, it would be an actor on its own. You may drag it inside the SUC boundary. In a final implementation it might be a system task to poll a queue or a subscriber to the queue. But from the added value viewpoint it's just a simple actor to watch a queue and do something with it.
Since "an order is sent... automatically" the system is the actor. Assuming you are automating using software system.
However in real life it all goes wrong and businesses are far away from sending orders automatically.
Badgerbadger, you need to be carefull about statement "sent an order".
It is easy to overlook actual business process and previous subfunction-level answers are giving a nudge.
More realistic scenario is that the system is the actor who just initiates the check of stocks, maybe orchestrates. And then there is a complicated process of getting approval, finding budget, checking if goods are really required to be ordered and so on. In theory all of those can be automated, but usually in practice there is a work for human actor as well. Of course you may be lucky and avoid all of this.
Example of pessimistic scenario to consider:
It was eight bags, but now hits three. John just bought five bags from us, hooray!
System automatically places an order and pays for 10 more bags.
In ten minutes someone ordered one more.
System automatically places an order again and pays for 10 more bags.
We don't have enough money on account so we are facing overdraft.
At the same time regular inspection found that one bag is rotten.
System: 10 more bags please!
Our colleague just told us that we've missed 20% discount from supplier on simultaneous orders of 25+ bags which can be obtained only by calling a company manager.
John cancels his order in an hour after placing it.
Now we have 36 bags and a hole in our budget.
And our marketing manager just told us that the green bags system ordered are actually now selling with a discount, we are getting rid of them in favour of new line of blue bags.
You have to possibilities here:
Each time the stock is used the system checks if the condition to order is met and if yes then order it. As ordering will probably involve another actor (e.g. supplier's system), you might need to model it as a separate UC and use extend relationship. In this case you will not have any additional actor initiating the order, only the actor initiating stock usage.
The system checks on a regular basis what stocks reached the automated order level and make a single mass order. In this case you'll have to model Scheduler as an actor (in reality it's another system running on the server so it's fine to call it an actor). Some people prefer modelling such actor as "time" but it's discouraged.

Resources