Handling Race condition in CQRS/ES with read-side

Handling Race condition in CQRS/ES with read-side - domain-driven-design

I am building an app for managing a health clinic.
We find a race condition case when an appointment is scheduled, and until now, none of team members reaches a solution.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month
in a week, doctors can only attend N appointments
So, the first approach we think is to create an aggregate that will hold all appointments, responsible for schedule them, but this aggregate will be huge and technically is not acceptable.
The second approach, and the current one, is to create Appointment as an Aggregate Root, and then validate it using a domain service (interface in domain layer and implementation in infra layer), which queries the read side.
Today its look like:
Inside command handler, instantiate new Appointment, passing a domain service in its constructor
Appointment calls domain service, which query the read side and validate the rules. However, race conditions can occurs here (two appointments being scheduled at the same time, as the two do not see each other, both will be created).
If domain service validate the rules, then the Appointment is created, but with status PENDING, and a domain event AppointmentRequested is fired.
On the read side, this event was subscribed and a projection is inserted in the read db (status = PENDING). In same transaction, a command CompleteAppointmentSchedule is inserted in my outbox and soon is sent and received asynchronously by the write side.
write side handles the command calling appointment.CompleteSchedule(domainService). The same domain service passed when instantiate a new appointment is passed again to the appointment. But, now, the appointment will already be in the read db, and will be possible to check the business rules.
Is it correct to use read side this way? We cannot think another the way to check this rules without using the read side. A team member suggested that we could create a private read-side for our write-side, and use it instead of a read-side in these cases, but, as we use EventStore DB, we would have to create another database like the one we use on the read-side (pgsql) to be able to do it that way on this private read-side.

I am building an app for managing a health clinic.
Reserve an office, get the entire team together, and watch Trench Talk: Evolving a Model. Yves Reynhout has been doing (and talking about) domain driven design, and his domain is appointment scheduling for healthcare.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month in a week,
doctors can only attend N appointments
One of the things you are going to need to discuss with your domain experts; do you need to prevent scheduling conflicts, or do you need to identify scheduling conflicts and resolve them?
Recommended reading:
Race Conditions Don't Exist - Udi Dahan, 2010
Memories, Guesses, and Apologies - Pat Helland, 2007
That said, you are really close to a common answer.
You make the your checks against a cached copy of the calendar, to avoid the most common collisions (note that there is still a race condition, when you are checking the schedule at the same time somebody else is trying to cancel the conflicting appointment). You then put an appointment request message into a queue.
Subscribing to the queue is a Service-as-in-SOA, which is the technical authority for all information related to scheduling. That service has its own database, and checks its own authoritative copy of everything before committing a change.
The critical different here is that when the service is working directly with locked instances of the data. That might be because the event handler in the service is the only process that has write permissions on the authoritative data (and is itself handling only one message at a time), or it might be because the event handler locks all of the data necessary to ensure that the result of the write is still consistent with the business rules (conflicting writes competing for the same lock, thus ensuring that data changes are controlled).
In effect, all attempts to change the authoritative calendar data are (logically) serialized, to ensure that the writes cannot conflict with each other.
In the language of CQRS, all of this locking is happening in the write model of the calendar service. Everybody else works from unlocked copies of the data, which are provided by the read model (with some modest plumbing involved in copying data change from the write model to the read model).

Related

How to model Betting/Accounting BoundedContexts when betting relies heavily on account balance?

Lets say you have an application where you can create a bet on a coin toss. Your account has a balance that was funded with your credit card.
The sequence of events is the following:
POST /coin_toss_bets { amount: 5 USD }
Start transaction/acquire locks inside the Bet subdomain useCase
Does the user have enough balance? (check accounting aggregate balance projection of the users deposits)
Debit the users account for the amount for 5 USD
Create bet/flip the coin to get a result
Payout the user if they bet on the correct side
Commit transaction
UI layer is given the bet and displays an animation
My question is how this can be modeled with 2 separate BoundedContexts (betting/accounting). Its said that database transactions should not cross a BoundedContext since they can be located on different machines/microservices, but in this scenario, the use case of creating a bet heavily relies on a non-dirty read of the users projected account balance (strong consistency).
There is also no way to perform a compensating action if the account is overdebited, since the UI layer is requiring that the bet is created atomically.
Is there any way to do this with CQRS/Event Sourcing that doesn't require asking for the users account balance inside the betting subdomain? Or would you always have to ensure that the balance projection is correct inside this transaction (they must be deployed together)?

Ensuring that the account has sufficient balance for a transaction seems to be an invariant business rule in your case. So let us assume that it cannot be violated.
Then the question is simply about how to handle "transactions" that span across boundary contexts.
DDD does say that transactions (invariant boundaries) should not cross a Bounded Context (BC). The rule is applicable even at the level of aggregates. But the correct way to read it would be transaction as part of a "Single Request."
The best way to deal with this scenario is to simply accept the request from UI to place a bet and return a "202 Accepted" status message, along with a unique job tracker ID. The only database interaction during request processing should be to persist the data into a "Jobs" table and probably trigger a "BET_PLACED" domain event.
You would then process the Bet asynchronously. Yes, the processing would still involve calling the Accounts bounded context, but through its published API. Since you are not in the context of a request anymore, the processing time need not fit into usual constraints.
Once the processing is completed, either the UI would refresh the page at regular intervals and update the user, or you can send a Push Notification to the browser.

DDD/Event sourcing, getting data from another microservice?

I wonder if you can help. I am writing an order system and currently have implemented an order microservice which takes care of placing an order. I am using DDD with event sourcing and CQRS.
The order service itself takes in commands that produce events, the actual order service listens to its own event to create a read model (The idea here is to use CQRS, so commands for writes and queries for reads)
After implementing the above, I ran into a problem and its probably just that I am not fully understanding the correct way of doing this.
An order actually has dependents, meaning an order needs a customer and a product/s. So i will have 2 additional microservices for customer and products.
To keep things simple, i would like to concentrate on the customer (although I have exactly the same issue with products but my thinking is that if I fix the customer issue then the other one is automatically fixed also).
So back to the problem at hand. To create an order the order needs a customer (and products), I currently have the customerId on the client, so sending down a command to the order service, I can pass in the customerId.
I would like to save the name and address of the customer with the order. How do I get the name and address of the customerId from the Customer Service in the Order Service ?
I suppose to summarize, when data from one service needs data from another service, how am I able to get this data.
Would it be the case of the order service creating an event for receiving a customer record ? This is going to introduce a lot of complexity (more events) in the system
The microservices are NOT coupled so the order service can't just call into the read model of the customer.
Anybody able to help me on this ?

If you are using DDD, first of all, please read about bounded context. Forget microservices, they are just implementation strategy.
Now back to your problem. Publish these events from Customer aggregate(in your case Customer microservice): CustomerRegistered, CustomerInfoUpdated, CustomerAccountRemoved, CustomerAddressChanged etc. Then subscribe your Order service(again in your case application service inside Order microservice) to listen all above events. Okay, not all, just what order needs.
Now, you may have a question, what if majority or some of my customers don't make orders? My order service will be full of unnecessary data. Is this a good approach?
Well, answer might vary. I would say, space in hard disk is cheaper than memory or a database query is faster than a network call in performance perspective. If your database host(or your server) is limited then you should not go with microservices. Moreover, I would make some business ideas with these unused customer data e.g. list all customers who never ordered anything, I will send them some offers to grow my business. Just kidding. Don't feel bothered with unused data in microservices.

My suggestion would be to gather the required data on the front-end and pass it along. The relevant customer details that you want to denormalize into the order would be a value object. The same goes for the product data (e.g. id, description) related to the order line.
It isn't impossible to have the systems interact to retrieve data but that does couple them on a lower level that seems necessary.

When data from one service needs data from another service, how am I able to get this data?
You copy it.
So somewhere in your design there needs to be a message that carries the data from where it is to where it needs to be.
That could mean that the order service is subscribing to events that are published by the customer service, and storing a copy of the information that it needs. Or it could be that the order service queries some API that has direct access to the data stored by the customer service.
Queries for the additional data that you need could be synchronous or asynchronous - maybe the work can be deferred until you have all of the data you need.
Another possibility is that you redesign your system so that the business capability you need is with the data, either moving the capability or moving the data. Why does ordering need customer data? Can the customer service do the work instead? Should ordering own the data?
There's a certain amount of complexity that is inherent in your decision to distribute the work across multiple services. The decision to distribute your system involves weighing various trade offs.

Data duplication between two aggregates

Two bounded contexts, implemented as microservices:
User Management
Accounting
The User Management hosts the aggregate User with its Name, Email, etc.
Some Users, on the other hand, become Customers within the Accounting bounded context. The Customer has its own workflows, thereby it is an aggregate on its own. Its creation is triggered by the UserRegistered event (publish/subscribe mechanism).
In order to send an invoice, the Accounting needs the email address of the Customer. I'm wondering if the email address (whose data master is the User) should become part of the aggregate Customer, which would entail synchronizing each email address change of the User.
The other solution, which I'm inclined to consider cleaner, is to project the email address (and its changes) to a readmodel within the Accounting. Thus, the aggregate Customer is data master of its own state (e.g. payment workflow), but not the data already given by the User.
What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?

What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?
No. There is nothing wrong with having one "master" copy, owned by the authority of the data, and multiple subordinate copies. When the authority is outside of your model altogether, then all of your copies might be subordinates to the real authority.
The duplicate copies of data support autonomy -- even though the master isn't currently available, other components in the system can continue to make progress using their local copies of the data.
You do want to have some care in your design -- the closer your capability is to the authority of the data it needs, the fewer problems you are likely to have.
(Remember, cache invalidation is one of the two hard problems).
A simplified example of this might the paid status of an invoice. Your fulfillment system may need to know if invoice has been paid before it can release shipment. Your billing system owns the decision that an invoice has been paid. There's a small sliver of information shared between the two.
But the fulfillment systems copy of that data is subordinate -- the fulfillment system doesn't have the authority to reject a paid invoice. (It may, of course, have the authority to raise exception reports "we can't meet the requirements of the purchase contract", or whatever).

Design pattern to address invitation race condition

Before I begin describing the problem I'm facing, let me assure you that I've checked if there already is another thread where this has been talked about. After about 5-6 tries clicking on suggestions, I gave up, since it's hard to get an idea from threads with generic names like "What design pattern can I use?"
So I've given this question as descriptive a title as I could come up with. The reason for my concern about this being asked already is that it feels like it should be a fairly common problem (surely others would've encountered this in their client-server program).
=====
So here's my problem...
I've got a single server S, and several clients C1, C2, ..., Cn.
A client can do 1 of three things at any given time:
Create an event.
Invite other clients to created events.
Accept or reject invitations to events created by other clients.
A client sees names for events they've created (and possibly invited other clients to) as well as names for events they've accepted invitations to. The server processes all invitations; when a client invites another client to an event, the invitation goes through S but S knows nothing about an event E other than the name associated with it, the inviting client, and the invited clients. Let's symbolise the name of an event E as |E|.
Now for two events Ea and Eb, |Ea| != |Eb| does not imply Ea != Eb. That is, just because two events have different names does not mean they are different. I won't formally define what makes two events the same here, but as a use-case, let's say two events are the same if they have the same location/time. However the server never knows this info remember, only the clients do, but the clients may not communicate well enough beforehand with each other and so may choose different names to represent the same (intended) event.
My problem: I want to avoid a situation where a client Ca accepts an invitation from a client Cb to an event Eb, and Cb accepts an invitation from Ca to Ea, where Ea = Eb. This would lead to each client seeing both |Ea| and |Eb|, which actually represent the same event.
Question: How do I avoid the above? Is there a design pattern that can work on the server alone, client alone, or both server and client together? The solution can include dialogs/prompts for clients.
=====
A practical implementation for such a client-server setup could be discussion topics as events and employees as clients. Imagine a situation where Craig and Matt are colleagues who rarely see each other. They suddenly realise that their boss had asked them to look into why a recent software upgrade wasn't working for some of their customers. But neither knows the other person has been asked to look into the issue as well. So Craig creates the event 'Discuss recent upgrade', and Matt (who's done bit more research than Craig) suspecting it to be an (ahem) Adobe issue, creates 'Investigate new Adobe add-on'. They both invite each other to these topics, and both being very polite, readily accept. Confusion ensues.

How about the solution running core component # server?
I am trying to map this problem with the standard scheduling problem, with a variation of informing the conflicting appointments and finding conflict logic.
Here server keeps the list of events(or appointments). When a request for a new event comes, a conflict check is made. If a conflict is found (may be location and time or even something else - abstract this with a strategy pattern). Server completely conceals the complexity with a simple facade pattern.
#Client side a conflict can be coded to have dialog getting the user feedback to create a new or use existing event.
The next key thing will be, what and how an event will be stored. Again this majorly depends on use case. If we are designing for an event with following data
Name, Date, Start time, Location, Organizer, Description
The storage # server can be typical 'Outlook based calendar' way or 'list of events # buckets of time'.

External id as domain identity

Our application sends/receives a lot of data to/from a third party we work with.
Our domain model is mainly populated with that data.
The 'problem' we're having is identifying a 'good' candidate as domain identity for the aggregate.
It seems like we have 3 options:
Generate a domain identity (UUID or DB-sequence...);
Use the External-ID as domain identity that comes along with all data from the external source.
Use an internal domain identity AND External-ID as a separate id that 'might' be used for retrieval operations; the internal id is always leading
About the External-ID:
It is 100% guaranteed the ID will never change
The ID is always managed by the external source
Other domains in our system might use the external-id for retrieval operations
Especially the last point above convinced us that the external-id is not an infrastructural concern but really belongs to the domain.
Which option should we choose?
** UPDATE **
Maybe I was not clear about the term '3th party'.
Actually, the external source is our client who is active in the Car industry. Our application uses client's master data to complete several 'things'. We have several Bounded Contexts (BC) like 'Client management', 'Survey', 'Appointment',
'Maintenance' etc.
Our client sends us 'Tasks' that describe something needs te be done.
That 'something' might be:
'let client X complete survey Y'
'schedule/cancel appointment for client X'
'car X for client Y is scheduled for maintenance at position XYZ'
Those 'Tasks' always have a 'task-id' that is guaranteed to be unique.
We store all incoming 'Tasks' in our database (active record style). Every possible action on a task maps with a domain event. (Multiple BCs might be interested in the same task)
Every BC contains one or more aggregates which distribute some domain events to other BCs. For instance, when an appointment is canceled a domain event is triggered, maintenance listens to that event to get some things done.
However, our client expects some message after every action that is related to a Task. Therefore we always need to use the 'task-id'.
To summarize things:
Tasks have a task-id
Tasks might be related to multiple BCs
Every BC sends some 'result message' to the client with the related task-id
Task-ids are distributed by domain events
We keep every (internally) persisted task up-to-date
Hopefully, I was clear enough about the use of the external-id (= task-id) and our different BCs.

My gut feeling would be to manage your own identity and not rely on a third party service for this, so option 3 above. Difficult to say without context though. What is the 3rd party system? What is your domain?
Would you ever switch the 3rd party service?
You say other parts of your domain might use the external id for querying - what are they querying? Your internal systems or the 3rd party service?
[Update]
Based on the new information it sounds like a correlationId. I'd store it alongside the other information relevant to the aggregates.

As a general rule, I would veto using a DB-sequence number as a identifier; the domain model should be independent of the choice of persistence; the domain model writes the identifier to the database, rather than the other way around (if the DB wants to be tracking a sequence number for its own purposes, that's fine).
I'm reluctant to use the external identifier, although it can make sense in some circumstances. A given entity, like "Customer" might have representations in a number of different bounded contexts - it might make sense to use the same identifier for all of them.
My default: I would reach for a name based uuid, using the external ID as part of the seed, which gives a simple mapping from external to internal.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string