DDD: Aggregate boundary vs compositional convenience. - domain-driven-design

I'm starting a new DDD project and I feel like I haven't really grasped the concepts yet. I have a two aggregate roots in my domain so far, Recipe and Food. Their relationship is like this :
`Recipe`->*`Ingredient`->`Food`
An Ingredient is a quantity + a Food
I think that maybe having the aggregate (Food) in the Recipe aggregate is bad, but it's really convenient because it allows to calculate the nutritional values of a recipe by navigating the relation.
I also do not store the result of the calculation and I recalculate every time so if a food is changed the recipe is updated automatically and I don't know if it's a good idea.
Also, is it Ok to have a single repository for Recipe+Ingredient since they are in the same aggregate and I have to load all the ingredients so I can pass them in the constructor of the recipe anyway?

is it Ok to have a single repository for Recipe+Ingredient since they are in the same aggregate and I have to load all the ingredients so I can pass them in the constructor of the recipe anyway?
That is what most people would expect - a single repository that loads all of the state contained within an aggregate boundary.
I think that maybe having the aggregate (Food) in the Recipe aggregate is bad, but it's really convenient because it allows to calculate the nutritional values of a recipe by navigating the relation.
The bad outweighs the good.
This is not obvious when using a simplified domain to discover how to DDD.
The motivation for aggregates is to constrain the way the data in your model changes, so that every change leaves the model in an internally consistent state.
If your domain doesn't have any consistency constraints to enforce (for instance, when your data model is really just a database, and the data model has no veto power over proposed changes), then the problem space doesn't offer a lot of guidance in choosing good aggregate boundaries.
The basic principle behind aggregates is that you are creating a sort of data firewall, you never need to worry that changes to this aggregate are going to violate the rules of that aggregate. This implies, among other things, that no two aggregates should ever share mutable data.
Sharing values is fine -- each aggregate gets its own immutable copy of a value, and changing the value in one aggregate doesn't affect the other at all. Sharing entities is bad, because entities are mutable.
Having a model that makes it really convenient to produce a query with the wrong answers isn't as valuable as a model that ensures that the answers are correct.
What will normally happen is that one aggregate will reference another by ID. So your recipe would look like
`Recipe`->*`Ingredient`->`Id<Food>`
Id<Food> is just another immutable value type; the recipe can easily change which food it uses, but it can't change the food in any way. Which in turn means that you never need to worry that changing one Recipe will break another.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

What is an Aggregate Root?

No, it is not a duplication question.
I have red many sources on the subject, but still I feel like I don't fully understand it.
This is the information I have so far (from multiple sources, be it articles, videos, etc...) about what is an Aggregate and Aggregate Root:
Aggregate is a collection of multiple Value Objects\Entity references and rules.
An Aggregate is always a command model (meant to change business state).
An Aggregate represents a single unit of (database - because essentialy the changes will be persisted) work, meaning it has to be consistent.
The Aggregate Root is the interface to the external world.
An Aggregate Root must have a globally unique identifier within the system
DDD suggests to have a Repository per Aggregate Root
A simple object from an aggregate can't be changed without its AR(Aggregate Root) knowing it
So with all that in mind, lets get to the part where I get confused:
in this site it says
The Aggregate Root is the interface to the external world. All interaction with an Aggregate is via the Aggregate Root. As such, an Aggregate Root MUST have a globally unique identifier within the system. Other Entites that are present in the Aggregate but are not Aggregate Roots require only a locally unique identifier, that is, an Id that is unique within the Aggregate.
But then, in this example I can see that an Aggregate Root is implemented by a static class called Transfer that acts as an Aggregate and a static function inside called TransferedRegistered that acts as an AR.
So the questions are:
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that its a function. what does have a globaly unique identifier is the Domain Event that this function produces.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function of the Aggregate class itself?
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier), then how can we interact with this Aggregate? the first article clearly stated that all interaction with an Aggregate is by the AR, if the AR is an event, then we can do nothing but react on it.
Is it right to say that the aggregate has two main jobs:
Apply the needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be raised in a Domain Event from the AR
Please correct me on any of the bullet points in the beginning if some/all of them are wrong is some way or another and feel free to add more of them if I have missed any!
Thanks for clarifying things out!
I feel like I don't fully understand it.
That's not your fault. The literature sucks.
As best I can tell, the core ideas of implementing solutions using domain driven design came out of the world of Java circa 2003. So the patterns described by Evans in chapters 5 and six of the blue book were understood to be object oriented (in the Java sense) domain modeling done right.
Chapter 6, which discusses the aggregate pattern, is specifically about life cycle management; how do you create new entities in the domain model, how does the application find the right entity to interact with, and so on.
And so we have Factories, that allow you to create instances of domain entities, and Repositories, that provide an abstraction for retrieving a reference to a domain entity.
But there's a third riddle, which is this: what happens when you have some rule in your domain that requires synchronization between two entities in the domain? If you allow applications to talk to the entities in an uncoordinated fashion, then you may end up with inconsistencies in the data.
So the aggregate pattern is an answer to that; we organize the coordinated entities into graphs. With respect to change (and storage), the graph of entities becomes a single unit that the application is allowed to interact with.
The notion of the aggregate root is that the interface between the application and the graph should be one of the members of the graph. So the application shares information with the root entity, and then the root entity shares that information with the other members of the aggregate.
The aggregate root, being the entry point into the aggregate, plays the role of a coarse grained lock, ensuring that all of the changes to the aggregate members happen together.
It's not entirely wrong to think of this as a form of encapsulation -- to the application, the aggregate looks like a single entity (the root), with the rest of the complexity of the aggregate being hidden from view.
Now, over the past 15 years, there's been some semantic drift; people trying to adapt the pattern in ways that it better fits their problems, or better fits their preferred designs. So you have to exercise some care in designing how to translate the labels that they are using.
In simple terms an aggregate root (AR) is an entity that has a life-cycle of its own. To me this is the most important point. One AR cannot contain another AR but can reference it by Id or some value object (VO) containing at least the Id of the referenced AR. I tend to prefer to have an AR contain only other VOs instead of entities (YMMV). To this end the AR is responsible for consistency and variants w.r.t. the AR. Each VO can have its own invariants such as an EMailAddress requiring a valid e-mail format. Even if one were to call contained classes entities I will call that semantics since one could get the same thing done with a VO. A repository is responsible for AR persistence.
The example implementation you linked to is not something I would do or recommend. I followed some of the comments and I too, as one commenter alluded to, would rather use a domain service to perform something like a Transfer between two accounts. The registration of the transfer is not something that may necessarily be permitted and, as such, the domain service would be required to ensure the validity of the transfer. In fact, the registration of a transfer request would probably be a Journal in an accounting sense as that is my experience. Once the journal is approved it may attempt the actual transfer.
At some point in my DDD journey I thought that there has to be something wrong since it shouldn't be so difficult to understand aggregates. There are many opinions and interpretations w.r.t. to DDD and aggregates which is why it can get confusing. The other aspect is, in IMHO, that there is a fair amount of design involved that requires some creativity and which is based on an understanding of the domain itself. Creativity cannot be taught and design falls into the realm of tacit knowledge. The popular example of tacit knowledge is learning to ride a bike. Now, we can read all we want about how to ride a bike and it may or may not help much. Once we are on the bike and we teach ourselves to balance then we can make progress. Then there are people who end up doing absolutely crazy things on a bike and even if I read how to I don't think that I'll try :)
Keep practicing and modelling until it starts to make sense or until you feel comfortable with the model. If I recall correctly Eric Evans mentions in the Blue Book that it may take a couple of designs to get the model closer to what we need.
Keep in mind that Mike Mogosanu is using a event sourcing approach but in any case (without ES) his approach is very good to avoid unwanted artifacts in mainstream OOP languages.
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that
its a function. what does have a globaly unique identifier is the
Domain Event that this function produces.
TransferNumber acts as natural unique ID; there is also a GUID to avoid the need a full Value Object in some cases.
There is no unique ID state in the computer memory because it is an argument but think about it; why you want a globaly unique ID? It is just to locate the root element and its (non unique ID) childrens for persistence purposes (find, modify or delete it).
Order A has 2 order lines (1 and 2) while Order B has 4 order lines (1,2,3,4); the unique identifier of order lines is a composition of its ID and the Order ID: A1, B3, etc. It is just like relational schemas in relational databases.
So you need that ID just for persistence and the element that goes to persistence is a domain event expressing the changes; all the changes needed to keep consistency, so if you persist the domain event using the global unique ID to find in persistence what you have to modify the system will be in a consistent state.
You could do
var newTransfer = New Transfer(TransferNumber); //newTransfer is now an AG with a global unique ID
var changes = t.RegisterTransfer(Debit debit, Credit credit)
persistence.applyChanges(changes);
but what is the point of instantiate a object to create state in the computer memory if you are not going to do more than one thing with this object? It is pointless and most of OOP detractors use this kind of bad OOP design to criticize OOP and lean to functional programming.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function
of the Aggregate class itself?
It is the function itself. You can read in the post:
AR is a role , and the function is the implementation.
An Aggregate represents a single unit of work, meaning it has to be consistent. You can see how the function honors this. It is a single unit of work that keeps the system in a consistent state.
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier),
then how can we interact with this Aggregate? the first article
clearly stated that all interaction with an Aggregate is by the AR, if
the AR is an event, then we can do nothing but react on it.
Answered above because the domain event is not the AR.
4 Is it right to say that the aggregate has two main jobs: Apply the
needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be
raised in a Domain Event from the AR
Yes; again, you can see how the static function honors this.
You could try to contat Mike Mogosanu. I am sure he could explain his approach better than me.

DDD - Aggregates for read-only

If we are working on a sub-domain where we're only dealing with a read-only scenario, meaning that our entities and value objects will not be changed, does it make sense to create aggregates composed by roots and its children or should each entity of this context map to a single aggregate?
Imagine that we've entity A and entity B.
In a context where modifications are made, we create an aggregate composed by entity A and entity B, where A is the aggregate root (let's say that B can't live without A and there are some invariants involved).
If we move the same entities to a different context where no modifications are made, does it make sense to keep this aggregate or should we create an aggregate for entity A and a different one for entity B?
In 2019, there's fairly large support for the idea that in a read only scenario, you don't bother with the domain model at all.
Just load the data directly into whatever read only data structure makes sense to support the use case.
See also: cqrs.
The first thing is if B cant live without A and there are some invariants involved, to me A is an Aggregate root, with B being an entity that belongs to it.
Aggregate roots represent a real world concept and dont just exist for the convenience of modification. In many of our applications, we don't modify state of our aggregate roots once created - i.e. we in effect have immutable aggregate roots. These would have some logic for design by contract checks/invariant checks etc but they are in effect anaemic as there is no "Update" methods due to its immutability. Since the "blue book" was written by Eric Evans, alot of things have changed, e.g. the concept of NoSql database have become very popular, functional programming concepts have become very influential rising to more advanced DDD style architectures being recommended such as CQRS. So for example, rather than doing updates to a database I can append (i.e. insert) instead. This leads to aggregates no longer having to be "updated". This leads to leaner anaemic types but this is what we want in this context. The issue before with anaemic types was that "update logic" for a given type was put elsewhere in the codebase instead of being put into the type itself. However if you do not require "update logic" in the first place then you dont have that problem!
If for example there is an Order with many OrderItems, we would create an Order aggregate root and an OrderItem entity. Its a very important concept to distill your domain to properly identify what are aggregates, entities and value types.
Then creation of domain services, repositories etc just flows naturally. For example, aggregate roots and repositories are 1 to 1 i.e. in the example above we would have an Order repository and not have an OrderItem repository. That way your main domain concepts are spread throughout your code in a predictable and easy to understand way.
Finally, in your specific question I would not treat them as the same entities. In one context, you seem to need modification logic - in the other they you dont - they are separate domain concepts to me.
In context where modifications are made: A=agg root, B=entity.
In context without modifications: A=agg root (immutable), B=entity(immutable)

Association between aggregates, how to decide between holding reference to the object or only to its identity

For exemple, giving a performance having multiple performers...
First option:
Performance (1) ---> (*) Performer
Second option:
Performance
+PerformerIds[]
1st option Pros:
Easier access for query purpose (lets say I don't want to use CQRS)
When we look at the domain model it seems easier to understand, the relation between Performance and Performer is more visible
1st option Cons:
A Performance object is heavier to load (could possibly be fixed with lazy loading)
More coupling
2nd option pros and cons are obviously the opposite of the first option, harder access to performers from the performance, model diagram harder to understand, lighter to load and less coupling.
I kind of like the first option, because, there is no way a Performance object will ever use the Performer object. That relation is more like a data relation / query model.
But it also makes the domain model diagram less clear, in my opinion, so i'm not sure if I should which solution to use.
Could my problem here be that I'm trying to use the same class diagrams for domain experts and for developers ? and/or modeling for query primarily rather than for updating ?
how to decide between holding reference to the object or only to its identity
Holding a reference to an identity of related Aggregate Roots (ARs) makes their boundaries explicit.
Sure each AR still holds references to all its Entities but it becomes very explicit in your domain model whether you reference Aggregate Roots or Entities.
If you hold references to related Aggregate Roots (ARs) it's very easy to cross boundaries between them.
It can be very easy to change few aggregates at the same time, especially if you use ORM or have Unit of Work implemented. So your aggregate roots are not transaction boundaries any more.
If you hold only identifiers to related Aggregate Roots (ARs), in order to access a related AR you have to load it from repository first. It's a trade off. It becomes very explicit when you cross a boundary of one aggregate and query another one.
What I don't like of it is that when you look at a class diagram, all you see is separate aggregates that does not have any association between them, that does not look very useful to show the domain concepts that are related.
Your domain model is a "model" and it's up to you to make design decisions.
If all your Aggregate Roots are small, and you use an ORM that does a lot of magic for free (NOTE: There is always a price), and a value of seeing references on a class diagramme is bigger than value of seeing boundaries then try with holding a reference to related ARs, even if your code doesn't really need it.
And then evaluate your model over a time.

Should lookup values be modeled as aggregate roots?

As part of my domain model, lets say I have a WorkItem object. The WorkItem object has several relationships to lookup values such as:
WorkItemType:
UserStory
Bug
Enhancement
Priority:
High
Medium
Low
And there could possibly be more, such as Status, Severity, etc...
DDD states that if something exists within an aggregate root that you shouldn't attempt to access it outside of the aggregate root. So if I want to be able to add new WorkItemTypes like Task, or new Priorities like Critical, do those lookup values need to be aggregate roots with their own repositories? This seems a little overkill especially if they are only a key value pair. How should I allow a user to modify these values and still comply with the aggregate root encapsulation rule?
While the repository pattern as described in the blue book does emphasize its use being exclusive to aggregates, it does leave room open for exceptions. To quote the book:
Although most queries return an object or a collection of objects, it
also fits within the concept to return some types of summary
calculations, such as an object count, or a sum of a numerical
attribute that was intended by the model to be tallied.
(pg. 152)
This states that a repository can be used to return summary information, which is not an aggregate. This idea extends to using a repository to look up value objects, just as your use case requires.
Another thing to consider is the read-model pattern which essentially allows for a query-only type of repository which effectively decouples the behavior-rich domain model from query concerns.
Landon, I think that the only way is to make those value pairs aggregate roots. I know that is might look overkill, but that's DDD braking things into small components.
The reasons why I think using a repository is the right way are:
A user needs to be able to add those value pairs independently of a Work Item.
The value pairs don't have a local, unique identity
Remember that DDD is just a set of guidelines, not hard truths. If you think that this is overkill, you might want to create a lookup that returns the pairs as value objects. This might work out specially if you don't have a feature to add value pairs in the application, but rather through the database.
As a side note, good question! There are quite a few blog posts about this situations... But not all agree on the best way to do this.
Not everything should be modeled using DDD. The complexity of managing the reference data most likely wouldn't justify creating aggregate roots. A common solution would be to use CRUD to manage reference data, and have a Domain Service to interface with that data from the domain.
Do these lookups have ID's ? If not, you could consider making them Value Objects...

Resources