DDD Orchestration-based saga race condition - domain-driven-design

I have three aggregates Product, Seller, Buyer. The buyer offers to buy a product from the seller and the seller can accept or reject the buy offer. So the process for making the offer is this: in the buyer aggregate I check whether the buyer has already made offer too the product, then I check in the product aggregate if its status is in sale and check in the seller aggregate if the buyer is banned (the seller aggregate has list with banned users). If all checks are true the saga create new offer. But what if after I have checked whether the buyer is banned the buyer gets banned? The seller will ban a user and after that he can still receive an offer from the user?

Races are an inevitable consequence of designing a system with distributed decision making authority.
In other words, it takes time for the information that a particular shopper is banned to travel from the shopkeeper who made the decision to the centralized model. So in just the same way that we have to handle the case where we send an offer to the shop nanoseconds before the shop bans the shopper, so to do we need to handle the case where we send an offer in the nanoseconds between when the shop bans the shopper and when that information gets integrated into the domain model.
This is part of the tradeoff we accepted when we chose a distributed system in the first place.
As far as I can tell, we manage this mainly by setting expectations. "Bans shall be announced five minutes before they take effect", or whatever, to give the information time to move around your system.
Expectation setting might use the language of service levels (99% of all bans are effective within five minutes).
Mostly, you're going to be managing tradeoffs - how important is respecting a recent ban compared to expediting the delivery of offers? If you don't need low latency delivery, then you can afford to wait a little while to see if a ban shows up.
(If you'll look carefully, you'll see that there is still a race condition present - what we're really manipulating is the business significance of the race. See Udi Dahan's Race Conditions Don't Exist)
In a local setting, if you really need tight control of the sequence of bans and offers, then you need to have a single lock in the system shared by the code that processes bans for a particular shop and the code that processes offers for a particular shop.
That doesn't prevent the race, of course (you get different behaviors depending on whether ban acquires the lock nanoseconds before the offer or nanoseconds after), but does at least give you a clear "happens-before" relationship in your audit log.

Related

How to model account balance check? (Is it possible with eventual consistency / event driven architecture?)

If there are 2 bounded contexts: Spending and Account management, how to prevent so the user doesn't spend more than he has in balance?
I understand the case when we have Order > Payment > Shipment sequence for a single product, but here it's more like Order > Balance Check > Shipment and i don't understand how to check the balance in other bounded context and be sure user would never spend more than he has.
Is this situation even possible to solve with eventual consistency? I would prefer it with eventual consistency. If not, is the safe option to keep balance within the Spending bounded context?
[UPDATE]
I haven't mentioned, I'm thinking about the solution in event driven architecture where bounded contexts exchange information through events only.
In your case, eventual consistency for a balance check is not suitable.
Requirement: Do not ship goods until we know customer can pay.
Therefore you have a firm business requirement that the shipment process must wait for a response from the account balance check process. If balance check service is down then shipment cannot proceed.
In other business scenarios you may be able to continue with one part of the process and let the other part work on eventual consistency with an out-of-process message delivery in between the services. In your case, you cannot do this for the check balance part of the process.
To further complicate, your process would be:
Order > Check Balance > Ship > Deduct Funds.
You don't want to deduct the funds until shipment has occurred in case there is failure in shipment for some reason, but you definitely don't want to ship before the balance has been checked.
For this, I'd introduce the concept of an 'earmark' or 'reserved funds'.
So, your "Spending" context will send a 'reserve funds' request to "Account Manager" context and wait for response. That response can include a correlation id of the 'reservation of funds'. Your Account Management service would need to understand the concepts of "actual balance" and "reserved funds" to calculate an "available balance".
Once your shipment has completed you can then send a 'confirmation' to account management quoting the correlation id, so that account management can adjust "actual balance" and delete the "reserved funds". That step, in my opinion, could work with eventual consistency.

Handling Race condition in CQRS/ES with read-side

I am building an app for managing a health clinic.
We find a race condition case when an appointment is scheduled, and until now, none of team members reaches a solution.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month
in a week, doctors can only attend N appointments
So, the first approach we think is to create an aggregate that will hold all appointments, responsible for schedule them, but this aggregate will be huge and technically is not acceptable.
The second approach, and the current one, is to create Appointment as an Aggregate Root, and then validate it using a domain service (interface in domain layer and implementation in infra layer), which queries the read side.
Today its look like:
Inside command handler, instantiate new Appointment, passing a domain service in its constructor
Appointment calls domain service, which query the read side and validate the rules. However, race conditions can occurs here (two appointments being scheduled at the same time, as the two do not see each other, both will be created).
If domain service validate the rules, then the Appointment is created, but with status PENDING, and a domain event AppointmentRequested is fired.
On the read side, this event was subscribed and a projection is inserted in the read db (status = PENDING). In same transaction, a command CompleteAppointmentSchedule is inserted in my outbox and soon is sent and received asynchronously by the write side.
write side handles the command calling appointment.CompleteSchedule(domainService). The same domain service passed when instantiate a new appointment is passed again to the appointment. But, now, the appointment will already be in the read db, and will be possible to check the business rules.
Is it correct to use read side this way? We cannot think another the way to check this rules without using the read side. A team member suggested that we could create a private read-side for our write-side, and use it instead of a read-side in these cases, but, as we use EventStore DB, we would have to create another database like the one we use on the read-side (pgsql) to be able to do it that way on this private read-side.
I am building an app for managing a health clinic.
Reserve an office, get the entire team together, and watch Trench Talk: Evolving a Model. Yves Reynhout has been doing (and talking about) domain driven design, and his domain is appointment scheduling for healthcare.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month in a week,
doctors can only attend N appointments
One of the things you are going to need to discuss with your domain experts; do you need to prevent scheduling conflicts, or do you need to identify scheduling conflicts and resolve them?
Recommended reading:
Race Conditions Don't Exist - Udi Dahan, 2010
Memories, Guesses, and Apologies - Pat Helland, 2007
That said, you are really close to a common answer.
You make the your checks against a cached copy of the calendar, to avoid the most common collisions (note that there is still a race condition, when you are checking the schedule at the same time somebody else is trying to cancel the conflicting appointment). You then put an appointment request message into a queue.
Subscribing to the queue is a Service-as-in-SOA, which is the technical authority for all information related to scheduling. That service has its own database, and checks its own authoritative copy of everything before committing a change.
The critical different here is that when the service is working directly with locked instances of the data. That might be because the event handler in the service is the only process that has write permissions on the authoritative data (and is itself handling only one message at a time), or it might be because the event handler locks all of the data necessary to ensure that the result of the write is still consistent with the business rules (conflicting writes competing for the same lock, thus ensuring that data changes are controlled).
In effect, all attempts to change the authoritative calendar data are (logically) serialized, to ensure that the writes cannot conflict with each other.
In the language of CQRS, all of this locking is happening in the write model of the calendar service. Everybody else works from unlocked copies of the data, which are provided by the read model (with some modest plumbing involved in copying data change from the write model to the read model).

How to model Betting/Accounting BoundedContexts when betting relies heavily on account balance?

Lets say you have an application where you can create a bet on a coin toss. Your account has a balance that was funded with your credit card.
The sequence of events is the following:
POST /coin_toss_bets { amount: 5 USD }
Start transaction/acquire locks inside the Bet subdomain useCase
Does the user have enough balance? (check accounting aggregate balance projection of the users deposits)
Debit the users account for the amount for 5 USD
Create bet/flip the coin to get a result
Payout the user if they bet on the correct side
Commit transaction
UI layer is given the bet and displays an animation
My question is how this can be modeled with 2 separate BoundedContexts (betting/accounting). Its said that database transactions should not cross a BoundedContext since they can be located on different machines/microservices, but in this scenario, the use case of creating a bet heavily relies on a non-dirty read of the users projected account balance (strong consistency).
There is also no way to perform a compensating action if the account is overdebited, since the UI layer is requiring that the bet is created atomically.
Is there any way to do this with CQRS/Event Sourcing that doesn't require asking for the users account balance inside the betting subdomain? Or would you always have to ensure that the balance projection is correct inside this transaction (they must be deployed together)?
Ensuring that the account has sufficient balance for a transaction seems to be an invariant business rule in your case. So let us assume that it cannot be violated.
Then the question is simply about how to handle "transactions" that span across boundary contexts.
DDD does say that transactions (invariant boundaries) should not cross a Bounded Context (BC). The rule is applicable even at the level of aggregates. But the correct way to read it would be transaction as part of a "Single Request."
The best way to deal with this scenario is to simply accept the request from UI to place a bet and return a "202 Accepted" status message, along with a unique job tracker ID. The only database interaction during request processing should be to persist the data into a "Jobs" table and probably trigger a "BET_PLACED" domain event.
You would then process the Bet asynchronously. Yes, the processing would still involve calling the Accounts bounded context, but through its published API. Since you are not in the context of a request anymore, the processing time need not fit into usual constraints.
Once the processing is completed, either the UI would refresh the page at regular intervals and update the user, or you can send a Push Notification to the browser.

Data duplication between two aggregates

Two bounded contexts, implemented as microservices:
User Management
Accounting
The User Management hosts the aggregate User with its Name, Email, etc.
Some Users, on the other hand, become Customers within the Accounting bounded context. The Customer has its own workflows, thereby it is an aggregate on its own. Its creation is triggered by the UserRegistered event (publish/subscribe mechanism).
In order to send an invoice, the Accounting needs the email address of the Customer. I'm wondering if the email address (whose data master is the User) should become part of the aggregate Customer, which would entail synchronizing each email address change of the User.
The other solution, which I'm inclined to consider cleaner, is to project the email address (and its changes) to a readmodel within the Accounting. Thus, the aggregate Customer is data master of its own state (e.g. payment workflow), but not the data already given by the User.
What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?
What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?
No. There is nothing wrong with having one "master" copy, owned by the authority of the data, and multiple subordinate copies. When the authority is outside of your model altogether, then all of your copies might be subordinates to the real authority.
The duplicate copies of data support autonomy -- even though the master isn't currently available, other components in the system can continue to make progress using their local copies of the data.
You do want to have some care in your design -- the closer your capability is to the authority of the data it needs, the fewer problems you are likely to have.
(Remember, cache invalidation is one of the two hard problems).
A simplified example of this might the paid status of an invoice. Your fulfillment system may need to know if invoice has been paid before it can release shipment. Your billing system owns the decision that an invoice has been paid. There's a small sliver of information shared between the two.
But the fulfillment systems copy of that data is subordinate -- the fulfillment system doesn't have the authority to reject a paid invoice. (It may, of course, have the authority to raise exception reports "we can't meet the requirements of the purchase contract", or whatever).

Guidelines to decide when a domain role needs to be explicitly modelled

I looking for some guidelines as to when one must explicitly model a role in the domain model.
I will explain my current stance with the help of an example here.
Say we are building a health care system, and the business requirement states
"That only doctors with 3+ years of experience and certain
qualifications can perform surgeries"
In this case it is evident that the act of doing a surgery can only be performed by a person playing the role of a doctor and the doctor needs to meet certain prerequisites to perform the action
docter.performSurgery()
So basically all doctors are not the same
This method will probably check if the preconditions are met
So in the above cases, I will model the role explicitly.
Now lets consider the alternate scenario.
Only a admin can approve of a funds transfer
In the above case I do not find any need to model this role in domain, as their are no rules distinguishing one admin from another in my domain.
Any person/userlogin with the permission of admin can perform this action, I would rather design this into my security infrastructure and ensure that the
approveTransfer() method invoked on the application layer is invoked only if the currently logged in user has the ADMIN permission.
So the "domain model" by which i mean classes like the Account class is unaware of this rule, this is codified in the application layer either via AOP or probably the AccountService class or the like.
What do the wise men have to say about this ? :)
When designing aggregates I always ask myself an important question.
What is the consistency boundary for the process I'm attempting to model?
I ask what rules must be applied during any one atomic operation. This is referred to as a transactional boundary, and its your bread butter when defining your invariants (rules that must always be true during the lifetime of an atomic operation - start to end).
As I see it, the rule that a doctor/surgeon must have n years of experience - for a particular operation - is an invariant that must always be transactionally consistent (such as when performing a surgery). Therefor it should be modeled as a transactional boundary within a single aggregate.
Because aggregates can only guarantee the consistency within themselves, the invariants it is responsible for should not be leaked outside of it. In my opinion, and assuming that doctor is an aggregate, a separate roles model should not be responsible for an invariant that the doctor's model itself should be responsible for.
Aggregate to aggregate relationships should really only be established to provide 'help' in giving some missing piece of information. But the rules and how that information is interpreted should be isolated within its respective aggregate.
A separate situation can arise for user authentication. You may have a bounded context with a Customer model, but the details of permissions, authentication, and roles are so vast as to require an entirely separate system to deal with. In this case you may end up creating a separate bounded context for User Roles and Permission and linking the two bounded contexts. In this scenario you could have a domain service that deals with the communication between the two. Call the Customer root with an operation and pass in the domain service for some intention revealing double dispatch and let the domain service resolve wheather that operation goes through or not. In this scenario though, the responsibilities of user auth is not the Customer's at all. The Customer simply doesn't care (because it cannot itself guarantee the transaction) and its up to User Auth and Roles to decide what to do.
From: Implementing Domain Driven Design - Vaughn Vernon
A properly designed Aggregate is one that can be modified in any way required by the business with its invariants completely consistent within a single transaction. And a properly designed Bounded Context modifies only one Aggregate instance per transaction in all cases. What is more, we cannot correctly reason on Aggregate design without applying transactional analysis.

Resources