How to choose the proper bounds for an Aggregate in DDD?

How to choose the proper bounds for an Aggregate in DDD? - domain-driven-design

I'm trying to follow the DDD (Domain-Driven Design) approach in my project. My domain is a barbershop and the use case that I want to implement is - book an appointment.
The problem that I have is how to correctly organize boundaries around my aggregate that should help to handle this use cases (book an appointment).
Let's say I have a barber. Barber has working days. Working day has working hours (e.g. 09:00 - 20:00), breaks, and other booked appointments.
Schematically it would look like this
Barber
- WorkingDay
- 09:00-20:00 <- working hours
- Breaks
- 13:00-14:00
- 18:00-19:00
- Appointments
- 09:00-10:00
- 12:00-13:00
- WorkingDay
...
- WorkingDay
...
Rules to be considered:
New appointment must not overlap existing breaks
New appointment must not overlap existing appointments
New appointment must be within working hours
Working day must exist
Working day must not be in the past
I have two ideas of how to implement this:
Create WorkingDay aggregate which will contain all related breaks and appointments.
Pros:
All rules can be satisfied within WorkingDay aggregate
Single WorkingDayRepository repository
Cons:
Possibly large aggregate*
Create WorkingDay, Break, Appointment aggregates and verify rules in domain services
Pros:
Small aggreages
Cons
Multiple repositories (e.g. WorkingDayRepository, BreakRepository, AppointmentRepository)
Business logic is split between aggregates/domain-services
What other option can be used? Or what approach to follow in my case?

You could model this by using the concept of a Schedule. Schedule is your worker's workdays (or calendar) with appointments. It could be a decent aggregate - schedule has to be always valid.
Then you could have AddAppointment method on the Schedule which will check the invariants.
If you need to track changes, I'd suggest using Domain Events that could be fired and then logged.
To avoid concurrency problems with bookings, i.e. when 2 users are trying to book the same slot, you can:
Organise the DB in the way that it will not be possible to double-book, ie in your bookings table you will have a unique constraint on <time slot id>, <barber id> and when the second user is trying to book (insert a booking into the table) they will fail with unique constraint violation. Not sure how your db is structured but hope you get the idea.
use DB transactions. Both will try and update the row with time slot, only one should succeed.
put booking requests for the same barber into a queue.

Related

How to ensure data consistency between two different aggregates in an event-driven architecture?

I will try to keep this as generic as possible using the “order” and “product” example, to try and help others that come across this question.
The Structure: In the application we have 3 different services, 2 services that follow the event sourcing pattern and one that is designed for read only having the separation between our read and write views: - Order service (write) - Product service (write) - Order details service (Read)
The Background: We are currently storing the relationship between the order and product in only one of the write services, for example within order we have a property called ‘productItems’ which contains a list of the aggregate Ids from Product for the products that have been added to the order. Each product added to an order is emitted onto Kafka where the read service will update the view and form the relationships between the data.
The Problem: As we pull back by aggregate Id for the order and the product to update them, if a product was to be deleted, there is no way to disassociate the product from the order on the write side.
This in turn means we have inconsistency, that the order holds a reference to a product that no longer exists within the product service.
The Ideas:
Master the relationship on both sides, which means when the product is deleted, we can look at the associated orders and trigger an update to remove from each order (this would cause duplication of reference).
Create another view of the data that shows the relationships and use a saga to do a clean-up. When a delete is triggered, it will look up the view database, see the relationships within the data and then trigger an update for each of the orders that have the product associated.
Does it really matter having the inconsistencies if the Product details service shows the correct information? Because the view database will consume the product deleted event, it will be able to safely remove the relationship that means clients will be able to get the correct view of the data even if the write models appear inconsistent. Based on the order of the events, the state will always appear correct in the read view.
Another thought: as the aggregate Id is deleted, it should never be reused which means when we have checks on the aggregate such as: “is this product in the order already?” will never trigger as the aggregate Id will never be repurposed meaning the inconsistency should not cause an issue when running commands in the future.
Sorry for the long read, but these are all the ideas we have thought of so far, and I am keen to gain some insight from the community, to make sure we are on the right track or if there is another approach to consider.
Thank you in advance for your help.

Event sourcing suites very well human and specifically human-paced processes. It helps a lot to imagine that every event in an event-sourced system is delivered by some clerk printed on a sheet of paper. Than it will be much easier to figure out the suitable solution.
What's the purpose of an order? So that your back-office personnel would secure the necessary units at a warehouse, then customer would do a payment and you start shipping process.
So, I guess, after an order is placed, some back-office system can process it and confirm that it can be taken into work and invoicing. Or it can return the order with remarks that this and that line are no longer available, so that a customer could agree to the reduced order or pick other options.
Another option is, since the probability of a customer ordering a discontinued item is low, just not do this check. But if at the shipping it still occurs - then issue a refund and some coupon for inconvenience. Why is it low? Because the goods are added from an online catalogue, which reflects the current state. The availability check can be done on the 'Submit' button click. So, an inconsistency may occur if an item is discontinued the same minute (or second) the order has been submitted. And usually the actual decision to discontinue is made up well before the information was updated in the Product service due to some external reasons.
Hence, I suggest to use eventual consistency. Since an event-sourced entity should only be responsible for its own consistency and not try to fulfil someone else's responsibility.

Timesheet class diagram

I want to build a timesheet application where employees fill theirs daily tasks in .
I have build this diagram and I need some help if it is correct .
The role is either Scrum master, developer, Product owner, ...
The task_type are like: production, meeting, tests, ...
An agent can have different role based on the project he is affected
A daily timesheet can include different project, for example : an agent can work 1h in project A and 2h in project B

Does the diagram meet your requirements?
Let's verify:
There's a Role with a name that can be "Scrum master", "Developer", "Product owner" or whatever role you invent in the future.
There's a Task_type with a name that can be "production", "meeting", "tests" or what ever type of activities you may be interested in in the futre
An Agent is associated with an Affectation which is associated at the same time to a Project and a Role. An agent can therefore have different roles based on the project, if the multiplicity of the association-end on the side of the assignment is 0..*
An Agent can be associated with a daily Timesheet, the date of which is specified by date. The association-end would be 0..* on the side to Timesheet, to allow several time sheets for the same agent.
A Timesheet is associated with a Timesheet_line which is associated with a Project. The duration is indicated by time at the level of Timesheet_line. You may therefore "include" different timings on different projects for the same date if the multiplicity is 1..* between Timesheet and Timesheet_line at the latter end.
So yes, your diagram seems to fulfil the requirements.
There are however some ambiguities in your requirements:
Can Agent have several Role for the same Project, or shall the agent have a single role on a project? In the latter case, more work is needed (e.g. a constraint on the affectation).
Can Agent have several Timesheet for the same date or not? If not, more work is needed (e.g. a constraint)
Can the same Timesheet have several times the same Project (e.g. 1 hour meeting and 7 hours development) or even several time the same project for the same task type? If not, more work is needed.
In absence of a property type, is time the start time, a time interval, or a duration? I assume the latter and would then recommend going for renaming this property duration
Some more thoughts
Interestingly, your diagram allows an Agent to have time on the Timesheet for activities (Task_type) that do not belong to the assigned role, or even for projects to which the agent is not assigned. This may seem weird at first, but is in fact a valid way to allow to cope with historical data in case of change of assignments (the Affectation does not have a validity interval, so if someone changes role, the old role will be lost) which is not incomaptible with timesheet records entered before this change.
Agent and Admin seem to be both specializations of a more generic User. The question is then to know whether an Admin can also take the role of an Agent and enter time sheets, or if the role of Admin is incompatible with that of Agent.
Passwords should be unknown in the system and no object should know it. The accepted practice is to keep a private saltedPasswordHash property, and have a checkPassword() method which compare the hash corresponding to a user input to the hash kept with the object.
The next step in you rater data driven approach, would be to think carefully about every multiplicity at every end of every association, to have an unambiguous understanding of the relationships, and then complete the picture with behaviors.

CQRS Read Model Projections: How complex is too complex a data transformation

I want to sanity check myself on a view projection, in regards to if an intermediary concept can purely exist in the read model while providing a bridge between commands.
Let me use a contrived example to explain.
We place an order which raises an OrderPlaced event. The workflow then involves generating a picking slip, which is used to prepare a shipment.
A picking slip can be generated from an order (or group of orders) without any additional information being supplied from any external source or user. Is it acceptable then that the picking slip can be represented purely as a read model?
So:
PlaceOrderCommand -> OrderPlacedEvent
OrderPlacedEvent -> PickingSlipView
The warehouse manager can then view a picking slip, select the lines they would like to ship, and then perform a PrepareShipment command. A ShipmentPrepared event will then update the original order, and remove the relevant lines from the PickingSlipView.
I know it's a toy example, but I have a conceptually similar use case where a colleague believes the PickingSlip should be a domain entity/aggregate in its own right, as it's conceptually different to order. So you have PlaceOrder, GeneratePickingSlip, and PrepareShipment commands.
The GeneratePickingSlip command however simply takes an order number (identifier), transforms the order data into a picking slip entity, and persists the entity. You can't modify or remove a picking slip or perform any action on it, apart from using it to prepare a shipment.
This feels like introducing unnecessary overhead on the write model, for what is ultimately just a transformation of existing information to enable another command.
So (and without delving deeply into the problem space of warehouses and shipping)...
Is what I'm proposing a legitimate use case for a read model?
Acting as an intermediary between two commands, via transformation of some data into a different view. Or, as my colleague proposes, should every concept be represented in the write model in all cases?
I feel my approach is simpler, and avoiding unneeded complexity, but I'm new to CQRS and so perhaps missing something.
Edit - Alternative Example
Providing another example to explore:
We have a book of record for categories, where each record is information about products and their location. The book of record is populated by an external system, and contains SKU numbers, mapped to available locations:
Book of Record (Electronics)
SKU# Location1 Location2 Location3 ... Location 10
XXXX Introduce Remove Introduce ... N/A
YYYY N/A Introduce Introduce ... Remove
Each book of record is an entity, and each line is a value object.
The book of record is used to generate different Tasks (which are grouped in a TaskPlan to be assigned to a person). The plan may only cover a subset of locations.
There are different types of Tasks: One TaskPlan is for the individual who is on a location to add or remove stock from shelves. Call this an AllocateStock task. Another type of Task exists for a regional supervisor managing multiple locations, to check that shelving is properly following store guidelines, say CheckDisplay task. For allocating stock, we are interested in both introduced and removed SKUs. For checking the displays, we're only interested in newly Introduced SKUs, etc.
We are exploring two options:
Option 1
The person creating the tasks has a View (read model) that allows them to select Book of Records. Say they select Electronics and Fashion. They then select one or more locations. They could then submit a command like:
GenerateCheckDisplayTasks(TaskPlanId, List<BookOfRecordId>, List<Locations>)
The commands would then orchestrate going through the records, filtering out locations we don't need, processing only the 'Introduced' items, and creating the corresponding CheckDisplayTasks for each SKU in the TaskPlan.
Option 2
The other option is to shift the filtering to the read model before generating the tasks.
When a book of record is added a view model for each type of task is maintained. The data might be transposed, and would only include relevant info. ie. the CheckDisplayScopeView might project the book of record to:
Category SKU Location
Electronics (BookOfRecordId) XXXX Location1
Electronics (BookOfRecordId) XXXX Location3
Electronics (BookOfRecordId) YYYY Location2
Electronics (BookOfRecordId) YYYY Location3
Fashion (BookOfRecordId) ... ... etc
When generating tasks, the view enables the user to select the category and locations they want to generate the tasks for. Perhaps they select the Electronics category and Location 1 and 3.
The command is now:
GenerateCheckDisplayTasks(TaskPlanId, List<BookOfRecordId, SKU, Location>)
Where the command now no longer is responsible for the logic needed to filter out the locations, the Removed and N/A items, etc.
So the command for the first option just submits the ID of the entity that is being converted to tasks, along with the filter options, and does all the work internally, likely utilizing domain services.
The second option offloads the filtering aspect to the view model, and now the command submits values that will generate the tasks.
Note: In terms of the guidance that Aggregates shouldn't appear out of thin air, the Task Plan aggregate will create the Tasks.
I'm trying to determine if option 2 is pushing too much responsibility onto the read model, or whether this filtering behavior is more applicable there.
Sorry, I attempted to use the PickingSlip example as I thought it would be a more recognizable problem space, but realize now that there are connotations that go along with the concept that may have muddied the waters.

The answer to your question, in my opinion, very much depends on how you design your domain, not how you implement CQRS. The way you present it, it seems that all these operations and aggregates are in the same Bounded Context but at first glance, I would think that there are 3 (naming is difficult!):
Order Management or Sales, where orders are placed
Warehouse Operations, where goods are packaged to be shipped
Shipments, where packages are put in trucks and leave
When an Order is Placed in Order Management, Warehouse reacts and starts the Packaging workflow. At this point, Warehouse should have all the data required to perform its logic, without needing the Order anymore.
The warehouse manager can then view a picking slip, select the lines they would like to ship, and then perform a PrepareShipment command.
To me, this clearly indicates the need for an aggregate that will ensure the invariants are respected. You cannot select items not present in the picking slip, you cannot select more items than the quantities specified, you cannot select items that have already been packaged in a previous package and so on.
A ShipmentPrepared event will then update the original order, and remove the relevant lines from the PickingSlipView.
I don't understand why you would modify the original order. Also, removing lines from a view is not a safe operation per se. You want to guarantee that concurrency doesn't cause a single item to be placed in multiple packages, for example. You guarantee that using an aggregate that contains all the items, generates the packaging instructions, and marks the items of each package safely and transactionally.
Acting as an intermediary between two commands
Aggregates execute the commands, they are not in between.
Viewing it from another angle, an indication that you need that aggregate is that the PrepareShippingCommand needs to create an aggregate (Shipping), and according to Udi Dahan, you should not create aggregate roots (out of thin air). Instead, other aggregate roots create them. So, it seems fair to say that there needs to be some aggregate, which ensures that the policies to create shippings are applied.
As a final note, domain design is difficult and you need to know the domain very well, so it is very likely that my proposed solution is not correct, but I hope the considerations I made on each step are helpful to you to come up with the right solution.
UPDATE after question update
I read a couple of times the updated question and updated several times my answer, but ended up every time with answers very specific to your example again and I'm most likely missing a lot of details to actually be helpful (I'd be happy to discuss it on another channel though). Therefore, I want to go back to the first sentence of your question to add an important comment that I missed:
an intermediary concept can purely exist in the read model, while providing a bridge between commands.
In my opinion, read models are disposable. They are not a single source of truth. They are a representation of the data to easily fulfil the current query needs. When these query needs change, old read models are deleted and new ones are created based on the data from the write models.
So, only based on this, I would recommend to not prepare a read model to facilitate your commands operations.
I think that your solution is here:
When a book of record is added a view model for each type of task is maintained. The data might be transposed, and would only include relevant info.
If I understand it correctly, what you should do here is not create view model, but create an Aggregate (or multiple). Then this aggregate can receive the commands, apply the business rules and mutate the state. So, instead of having a domain service reading data from "clever" read models and putting it all together, you have an aggregate which encapsulates the data it needs and the business logic.
I hope it makes sense. It's a broad topic and we could talk about it for hours probably.

Modeling one-to-many relations using Domain Driven Design

This question is more of a general question about how to model simple one-to-many relations using collections: should a change in a list item be reflected in the version of the aggregate containing it?
The domain is about meeting scheduling (like in Outlook).
I have a Meeting entity, which can have multiple Participants.
A participant can accept/decline meeting requests.
Rescheduling a meeting nullifies all of the participants confirmations.
I thought of two ways to model this.
Option 1
The Meeting aggregate will contain a list of Participants where each Participant has a ParticipantId and a Status (accepted/denied).
The problem here is that every Accept or Deny command, for a specific participant, increments the Meeting's version, which means two participants will enter a race condition if trying to Accept the meeting request based on the same original version.
Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled? In this case we can't afford to ignore the Meeting's version, because this time the version DOES represent a real version that should be considered.
BTW, is it at all a good practice to ignore the version in some of the commands and not in others?
Option 2
Extract a Participation aggregate out of Meeting.
Participation will have MeetingId, ParticipantId, and Status.
It will also have its own version.
This way, when participant X Accepts the meeting request, only the relevant Participation will be modified, and the rest will be left intact.
And, when rescheduling the meeting, a "Meeting Rescheduled" event will be published and an event handler will respond to it by resetting all of the Participations' statuses to "NotAccepted" regardless of their current version.
On the one hand this sounds logical in the sense that a meeting's version shouldn't be incremented just because someone accepted/denied its request.
On the other hand, modeling Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
Anyway, would love to get feedback on this and see the various approaches to this problem.

Although this could be solved by re-reading the Meeting's document and retrying the Accept command, it's quite annoying considering how often this could happen.
This looks like a modeling error. You should keep in mind that the meeting aggregate is not the book of record for the participants availability - the real world is. So the message shouldn't be AcceptInvitation, but instead InvitationAccepted. There shouldn't be a conflict about this, because the domain model doesn't get to veto events outside of its authority boundary.
You might, depending on your implementation, end up with a concurrent modification exception in your plumbing, but that's something that you should be handling automatically (ie: expected version any, or a retry).
Another approach is to ignore the meeting's version when executing the Accept command, but this introduces a new problem: what happens if, after sending the meeting requests, the meeting has been rescheduled?
The solution here is to model more carefully. Yes, sometimes you will get a message that accepts or declines an invitation that has expired.
Put another way: race conditions don't exist.
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
What happens to Alice, who replied instantly to the invitation, when the meeting is rescheduled? Why wouldn't the same thing happen to Bob, when his reply arrives just after the meeting is rescheduled?
Participation as a standalone aggregate doesn't sound quite right to me, because it is has no meaning outside of the context of the meeting.
I find that heuristic isn't particularly effective. It's much more important to understand whether entities can change state independently, or if their changes need to be coordinated.
Actually, the Meeting aggregate is used to track the participants availability. That's what it purpose is. Unless I didn't fully understand you...
It's a bit subtle, and I didn't spell it out very well.
Suppose the model says that I'm available, but an emergency in the real world calls me away. What happens? Am I blocked from going to the hospital because the model says I have to go to a meeting? Can somebody cancel my emergency by changing the invitation I've submitted?
Furthermore, if I'm away on an emergency, are you available for a meeting that is scheduled for the same time as the meeting you and I were going to have?
In this space, the real world is the authority for whether or not somebody is available. The model is just looking at a cached copy of a message describing whether or not somebody was available in the past.
The cached information being used by the model is not guaranteed to be complete. See Greg Young on warehouse systems and exception reports.
which makes me think that perhaps the Meeting aggregate should have two version fields: one will be a strong version which, when incremented, represents a breaking change, and another soft version for non-breaking changes. Does this make any sense?
Not really. Version is not, as far as I know, a term taken from the ubiquitous language of scheduling meetings. It's meta data, if it exists at all, and the business rules in your model should not depend upon meta data.
I agree, but a Meeting ID (or any ID for that matter) is also not part of the ubiquitous language, yet I might pass it back and forth between my domain world and external worlds.

Context level DFD

So, not really sure if this is the right place for this but I have this current Context level data flow diagram for the bellow specification extract and I have never done one before so I was wondering if it was correct or if it needs fixing? any help appreciated
This is a link to a screen of my current one http://i.imgur.com/S4xvutc.png
SPECIFICATION
Currently the office staff operate the following processes:
Add/Amend/Delete Membership
This is run on-demand when a new membership application is received or when a member indicates that he/she wishes to make amendments to their details. It is also run in those rare instances when a membership is terminated at the discretion of the manager. A new member has an ID number allocated (simply incremented from the previous membership accepted). A membership balance is also maintained for accounting purposes.
Another process operates in a similar fashion on data associated with transfer partners.
Monthly Maintenance
This is run on the last day of each month to issue requests and reminders for subscriptions due, and to remove memberships where fees remain outstanding. Standard letters are also generated. Membership balances are updated as appropriate.
Payment Updates
This is run prior to the Monthly Maintenance, with membership balances being updated accordingly.
Payments to partners are also disbursed at this time.
New Member Search
This is run whenever a new member has been added to the database. The partners are partitioned in terms of vehicle category and location. Normally, there is a limited choice of partner in a particular location (if, indeed, there is any choice) but for some popular destinations, several partners are involved in providing the airport transfer. Thus, a search is then made through the appropriate section for potential matches in the following manner:
A search is then made on the grounds of sex (many female passengers in particular prefer a driver of their own sex, especially if travelling alone or in couples).
Matches are then selected according to factors such as cost (if available), availability of extra requested facilities (such as child seats, air-conditioning etc.)
Existing Member - Additional Searches
These are run on-demand in the same fashion as for a new member's search. Members may of course request any number of such searches, but a separate payment is due for each.
All financial transactions (payments) are also posted to the separate Accounts file, which also stores other financial details relating to running costs for the consideration of the firm's accountants at the end of the financial year.
Thanks for any help, regarding this level 0 Context only DFD

It needs some fixing.
The most obvious flaw is that you use verbs in your dataflows. In some cases this can be fixed easily by just discarding the verb. Return balance and status is not a datflow, but balance and status is.
In others cases it is not so easy. Check Balance, is it outstanding? sounds more like a Process than a dataflow. It looks like Accounting is responsible for doing that job. So will Accounting produce a list of outstanding balances? Or will it return a single balance and status, and if so, based on what input? Will your Airpot Transport System send a list of balances to check to Accounting?
Take for example Monthly Maintenance. What matters is that you want
requests and reminders for subscriptions due
Standard letters
These need to be visible in your DFD
The fact that you want to remove memberships where fees remain outstanding, probably has not place in the toplevel diagram, because that looks like an internal affair.
In general, focus on what the System produces. Maintaining internal state is secondary, is is a necessity to produce the desired output.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string