Data duplication between two aggregates - domain-driven-design

Two bounded contexts, implemented as microservices:
User Management
Accounting
The User Management hosts the aggregate User with its Name, Email, etc.
Some Users, on the other hand, become Customers within the Accounting bounded context. The Customer has its own workflows, thereby it is an aggregate on its own. Its creation is triggered by the UserRegistered event (publish/subscribe mechanism).
In order to send an invoice, the Accounting needs the email address of the Customer. I'm wondering if the email address (whose data master is the User) should become part of the aggregate Customer, which would entail synchronizing each email address change of the User.
The other solution, which I'm inclined to consider cleaner, is to project the email address (and its changes) to a readmodel within the Accounting. Thus, the aggregate Customer is data master of its own state (e.g. payment workflow), but not the data already given by the User.
What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?

What do you think? Is data duplication between two aggregates, generally speaking, a bad thing to do?
No. There is nothing wrong with having one "master" copy, owned by the authority of the data, and multiple subordinate copies. When the authority is outside of your model altogether, then all of your copies might be subordinates to the real authority.
The duplicate copies of data support autonomy -- even though the master isn't currently available, other components in the system can continue to make progress using their local copies of the data.
You do want to have some care in your design -- the closer your capability is to the authority of the data it needs, the fewer problems you are likely to have.
(Remember, cache invalidation is one of the two hard problems).
A simplified example of this might the paid status of an invoice. Your fulfillment system may need to know if invoice has been paid before it can release shipment. Your billing system owns the decision that an invoice has been paid. There's a small sliver of information shared between the two.
But the fulfillment systems copy of that data is subordinate -- the fulfillment system doesn't have the authority to reject a paid invoice. (It may, of course, have the authority to raise exception reports "we can't meet the requirements of the purchase contract", or whatever).

Related

DDD Orchestration-based saga race condition

I have three aggregates Product, Seller, Buyer. The buyer offers to buy a product from the seller and the seller can accept or reject the buy offer. So the process for making the offer is this: in the buyer aggregate I check whether the buyer has already made offer too the product, then I check in the product aggregate if its status is in sale and check in the seller aggregate if the buyer is banned (the seller aggregate has list with banned users). If all checks are true the saga create new offer. But what if after I have checked whether the buyer is banned the buyer gets banned? The seller will ban a user and after that he can still receive an offer from the user?
Races are an inevitable consequence of designing a system with distributed decision making authority.
In other words, it takes time for the information that a particular shopper is banned to travel from the shopkeeper who made the decision to the centralized model. So in just the same way that we have to handle the case where we send an offer to the shop nanoseconds before the shop bans the shopper, so to do we need to handle the case where we send an offer in the nanoseconds between when the shop bans the shopper and when that information gets integrated into the domain model.
This is part of the tradeoff we accepted when we chose a distributed system in the first place.
As far as I can tell, we manage this mainly by setting expectations. "Bans shall be announced five minutes before they take effect", or whatever, to give the information time to move around your system.
Expectation setting might use the language of service levels (99% of all bans are effective within five minutes).
Mostly, you're going to be managing tradeoffs - how important is respecting a recent ban compared to expediting the delivery of offers? If you don't need low latency delivery, then you can afford to wait a little while to see if a ban shows up.
(If you'll look carefully, you'll see that there is still a race condition present - what we're really manipulating is the business significance of the race. See Udi Dahan's Race Conditions Don't Exist)
In a local setting, if you really need tight control of the sequence of bans and offers, then you need to have a single lock in the system shared by the code that processes bans for a particular shop and the code that processes offers for a particular shop.
That doesn't prevent the race, of course (you get different behaviors depending on whether ban acquires the lock nanoseconds before the offer or nanoseconds after), but does at least give you a clear "happens-before" relationship in your audit log.

Handling Race condition in CQRS/ES with read-side

I am building an app for managing a health clinic.
We find a race condition case when an appointment is scheduled, and until now, none of team members reaches a solution.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month
in a week, doctors can only attend N appointments
So, the first approach we think is to create an aggregate that will hold all appointments, responsible for schedule them, but this aggregate will be huge and technically is not acceptable.
The second approach, and the current one, is to create Appointment as an Aggregate Root, and then validate it using a domain service (interface in domain layer and implementation in infra layer), which queries the read side.
Today its look like:
Inside command handler, instantiate new Appointment, passing a domain service in its constructor
Appointment calls domain service, which query the read side and validate the rules. However, race conditions can occurs here (two appointments being scheduled at the same time, as the two do not see each other, both will be created).
If domain service validate the rules, then the Appointment is created, but with status PENDING, and a domain event AppointmentRequested is fired.
On the read side, this event was subscribed and a projection is inserted in the read db (status = PENDING). In same transaction, a command CompleteAppointmentSchedule is inserted in my outbox and soon is sent and received asynchronously by the write side.
write side handles the command calling appointment.CompleteSchedule(domainService). The same domain service passed when instantiate a new appointment is passed again to the appointment. But, now, the appointment will already be in the read db, and will be possible to check the business rules.
Is it correct to use read side this way? We cannot think another the way to check this rules without using the read side. A team member suggested that we could create a private read-side for our write-side, and use it instead of a read-side in these cases, but, as we use EventStore DB, we would have to create another database like the one we use on the read-side (pgsql) to be able to do it that way on this private read-side.
I am building an app for managing a health clinic.
Reserve an office, get the entire team together, and watch Trench Talk: Evolving a Model. Yves Reynhout has been doing (and talking about) domain driven design, and his domain is appointment scheduling for healthcare.
When an appointment is scheduled, some business rules need to be verified:
cannot be scheduled to the same time as another with the same doctor or same patient
doctors can only attend N appointments in the month in a week,
doctors can only attend N appointments
One of the things you are going to need to discuss with your domain experts; do you need to prevent scheduling conflicts, or do you need to identify scheduling conflicts and resolve them?
Recommended reading:
Race Conditions Don't Exist - Udi Dahan, 2010
Memories, Guesses, and Apologies - Pat Helland, 2007
That said, you are really close to a common answer.
You make the your checks against a cached copy of the calendar, to avoid the most common collisions (note that there is still a race condition, when you are checking the schedule at the same time somebody else is trying to cancel the conflicting appointment). You then put an appointment request message into a queue.
Subscribing to the queue is a Service-as-in-SOA, which is the technical authority for all information related to scheduling. That service has its own database, and checks its own authoritative copy of everything before committing a change.
The critical different here is that when the service is working directly with locked instances of the data. That might be because the event handler in the service is the only process that has write permissions on the authoritative data (and is itself handling only one message at a time), or it might be because the event handler locks all of the data necessary to ensure that the result of the write is still consistent with the business rules (conflicting writes competing for the same lock, thus ensuring that data changes are controlled).
In effect, all attempts to change the authoritative calendar data are (logically) serialized, to ensure that the writes cannot conflict with each other.
In the language of CQRS, all of this locking is happening in the write model of the calendar service. Everybody else works from unlocked copies of the data, which are provided by the read model (with some modest plumbing involved in copying data change from the write model to the read model).

How to model Betting/Accounting BoundedContexts when betting relies heavily on account balance?

Lets say you have an application where you can create a bet on a coin toss. Your account has a balance that was funded with your credit card.
The sequence of events is the following:
POST /coin_toss_bets { amount: 5 USD }
Start transaction/acquire locks inside the Bet subdomain useCase
Does the user have enough balance? (check accounting aggregate balance projection of the users deposits)
Debit the users account for the amount for 5 USD
Create bet/flip the coin to get a result
Payout the user if they bet on the correct side
Commit transaction
UI layer is given the bet and displays an animation
My question is how this can be modeled with 2 separate BoundedContexts (betting/accounting). Its said that database transactions should not cross a BoundedContext since they can be located on different machines/microservices, but in this scenario, the use case of creating a bet heavily relies on a non-dirty read of the users projected account balance (strong consistency).
There is also no way to perform a compensating action if the account is overdebited, since the UI layer is requiring that the bet is created atomically.
Is there any way to do this with CQRS/Event Sourcing that doesn't require asking for the users account balance inside the betting subdomain? Or would you always have to ensure that the balance projection is correct inside this transaction (they must be deployed together)?
Ensuring that the account has sufficient balance for a transaction seems to be an invariant business rule in your case. So let us assume that it cannot be violated.
Then the question is simply about how to handle "transactions" that span across boundary contexts.
DDD does say that transactions (invariant boundaries) should not cross a Bounded Context (BC). The rule is applicable even at the level of aggregates. But the correct way to read it would be transaction as part of a "Single Request."
The best way to deal with this scenario is to simply accept the request from UI to place a bet and return a "202 Accepted" status message, along with a unique job tracker ID. The only database interaction during request processing should be to persist the data into a "Jobs" table and probably trigger a "BET_PLACED" domain event.
You would then process the Bet asynchronously. Yes, the processing would still involve calling the Accounts bounded context, but through its published API. Since you are not in the context of a request anymore, the processing time need not fit into usual constraints.
Once the processing is completed, either the UI would refresh the page at regular intervals and update the user, or you can send a Push Notification to the browser.

DDD/Event sourcing, getting data from another microservice?

I wonder if you can help. I am writing an order system and currently have implemented an order microservice which takes care of placing an order. I am using DDD with event sourcing and CQRS.
The order service itself takes in commands that produce events, the actual order service listens to its own event to create a read model (The idea here is to use CQRS, so commands for writes and queries for reads)
After implementing the above, I ran into a problem and its probably just that I am not fully understanding the correct way of doing this.
An order actually has dependents, meaning an order needs a customer and a product/s. So i will have 2 additional microservices for customer and products.
To keep things simple, i would like to concentrate on the customer (although I have exactly the same issue with products but my thinking is that if I fix the customer issue then the other one is automatically fixed also).
So back to the problem at hand. To create an order the order needs a customer (and products), I currently have the customerId on the client, so sending down a command to the order service, I can pass in the customerId.
I would like to save the name and address of the customer with the order. How do I get the name and address of the customerId from the Customer Service in the Order Service ?
I suppose to summarize, when data from one service needs data from another service, how am I able to get this data.
Would it be the case of the order service creating an event for receiving a customer record ? This is going to introduce a lot of complexity (more events) in the system
The microservices are NOT coupled so the order service can't just call into the read model of the customer.
Anybody able to help me on this ?
If you are using DDD, first of all, please read about bounded context. Forget microservices, they are just implementation strategy.
Now back to your problem. Publish these events from Customer aggregate(in your case Customer microservice): CustomerRegistered, CustomerInfoUpdated, CustomerAccountRemoved, CustomerAddressChanged etc. Then subscribe your Order service(again in your case application service inside Order microservice) to listen all above events. Okay, not all, just what order needs.
Now, you may have a question, what if majority or some of my customers don't make orders? My order service will be full of unnecessary data. Is this a good approach?
Well, answer might vary. I would say, space in hard disk is cheaper than memory or a database query is faster than a network call in performance perspective. If your database host(or your server) is limited then you should not go with microservices. Moreover, I would make some business ideas with these unused customer data e.g. list all customers who never ordered anything, I will send them some offers to grow my business. Just kidding. Don't feel bothered with unused data in microservices.
My suggestion would be to gather the required data on the front-end and pass it along. The relevant customer details that you want to denormalize into the order would be a value object. The same goes for the product data (e.g. id, description) related to the order line.
It isn't impossible to have the systems interact to retrieve data but that does couple them on a lower level that seems necessary.
When data from one service needs data from another service, how am I able to get this data?
You copy it.
So somewhere in your design there needs to be a message that carries the data from where it is to where it needs to be.
That could mean that the order service is subscribing to events that are published by the customer service, and storing a copy of the information that it needs. Or it could be that the order service queries some API that has direct access to the data stored by the customer service.
Queries for the additional data that you need could be synchronous or asynchronous - maybe the work can be deferred until you have all of the data you need.
Another possibility is that you redesign your system so that the business capability you need is with the data, either moving the capability or moving the data. Why does ordering need customer data? Can the customer service do the work instead? Should ordering own the data?
There's a certain amount of complexity that is inherent in your decision to distribute the work across multiple services. The decision to distribute your system involves weighing various trade offs.

External id as domain identity

Our application sends/receives a lot of data to/from a third party we work with.
Our domain model is mainly populated with that data.
The 'problem' we're having is identifying a 'good' candidate as domain identity for the aggregate.
It seems like we have 3 options:
Generate a domain identity (UUID or DB-sequence...);
Use the External-ID as domain identity that comes along with all data from the external source.
Use an internal domain identity AND External-ID as a separate id that 'might' be used for retrieval operations; the internal id is always leading
About the External-ID:
It is 100% guaranteed the ID will never change
The ID is always managed by the external source
Other domains in our system might use the external-id for retrieval operations
Especially the last point above convinced us that the external-id is not an infrastructural concern but really belongs to the domain.
Which option should we choose?
** UPDATE **
Maybe I was not clear about the term '3th party'.
Actually, the external source is our client who is active in the Car industry. Our application uses client's master data to complete several 'things'. We have several Bounded Contexts (BC) like 'Client management', 'Survey', 'Appointment',
'Maintenance' etc.
Our client sends us 'Tasks' that describe something needs te be done.
That 'something' might be:
'let client X complete survey Y'
'schedule/cancel appointment for client X'
'car X for client Y is scheduled for maintenance at position XYZ'
Those 'Tasks' always have a 'task-id' that is guaranteed to be unique.
We store all incoming 'Tasks' in our database (active record style). Every possible action on a task maps with a domain event. (Multiple BCs might be interested in the same task)
Every BC contains one or more aggregates which distribute some domain events to other BCs. For instance, when an appointment is canceled a domain event is triggered, maintenance listens to that event to get some things done.
However, our client expects some message after every action that is related to a Task. Therefore we always need to use the 'task-id'.
To summarize things:
Tasks have a task-id
Tasks might be related to multiple BCs
Every BC sends some 'result message' to the client with the related task-id
Task-ids are distributed by domain events
We keep every (internally) persisted task up-to-date
Hopefully, I was clear enough about the use of the external-id (= task-id) and our different BCs.
My gut feeling would be to manage your own identity and not rely on a third party service for this, so option 3 above. Difficult to say without context though. What is the 3rd party system? What is your domain?
Would you ever switch the 3rd party service?
You say other parts of your domain might use the external id for querying - what are they querying? Your internal systems or the 3rd party service?
[Update]
Based on the new information it sounds like a correlationId. I'd store it alongside the other information relevant to the aggregates.
As a general rule, I would veto using a DB-sequence number as a identifier; the domain model should be independent of the choice of persistence; the domain model writes the identifier to the database, rather than the other way around (if the DB wants to be tracking a sequence number for its own purposes, that's fine).
I'm reluctant to use the external identifier, although it can make sense in some circumstances. A given entity, like "Customer" might have representations in a number of different bounded contexts - it might make sense to use the same identifier for all of them.
My default: I would reach for a name based uuid, using the external ID as part of the seed, which gives a simple mapping from external to internal.

Resources