• We can express an association between two entities via object reference ( though a relationship between Aggregate root and its internal entity could also be expressed via a method defined on a root --> SomeRootEnt.BorrowMeIntEnt(...) ) or via Repository, which would retrieve related entities from a database.
• When relationship is expressed via Repository, client would directly call a Repository to obtain related entity(ies)
• The decision on how to express a relationship is based on whether this relationship is required for maintaining an integrity ( and perhaps on whether behavior requires association to be able to express itself )
1) If relationship ( say 1:* ) is expressed via Repository, are child entities required to contain an ID of a parent entity ( Note: this question is assuming that either at least a parent is a root or that they all reside within same aggregate )?
2) The relationship between entities in different aggregates should only be expressed by IDs
a) Why?
b) Wouldn't such a relationship essentially be expressed via Repository?
c) If yes to b), this would also suggest that associations between entities in different aggregates in most cases aren't required for the purpose of supporting a particular behavior? If yes, why?
UPDATE:
2a)
Expressing relationship with an identity reference is different from
expressing the relationship with an object reference. The reason that
the relationship should be expressed with an identity is to maintain
transactional integrity - ie only a single aggregate will be modified
in any given transaction. If an object reference was used, two
aggregates could be affected by a transaction.
I - I understand the point you're trying to make, since if relationship would be expressed with an object reference, then simply for the fact that aggregate A1 ( which is being modified ) is "physically" connected to another aggregate A2, it means one of A1's parts ( ie A2 <-- in reality A2 is not really a part of A1, but hopefully you understand the point ) is not in sync with other parts?!
II - But conceptually speaking, if we modify A1, then even if the relationship between the two is not expressed using object reference, the two aggregates are still out of sync, so in practical terms, what difference does it make whether A1 and A2 are physically connected?
III - Anyways, why couldn't we in BOTH cases simply use eventual consistency?
SECOND UPDATE:
original_2)
To be clear, relationships between root entities (ARs) should be
expressed with identities. References between non-root entities in
distinct aggregates should be forbidden.
Can we have unidirectional associations traversable only from A1 non-root entity(ies) to A2 root entity ( ie non-root entity in A1 would contain ID of A2 root )?
2)
a)
I.
If an object reference was used, two aggregates could be affected by a
transaction.
It creates a possibility of A2 being modified in the transaction which
is modifying A1. Referencing A2 by identity eliminates this
possibility. Also, things such as database locking won't apply to A2
when loading A1.
Ignoring for a moment the fact that DB locking won't apply to A2 when updating A1 - why shouldn't we in certain situations allow both A1 and A2 to be modified within a single transaction?
I know aggregates define consistency boundary, but if in rare cases we need both A1 and A2 to be synchronized within same transaction, then only other alternative may be to convert both A1 and A2 into a single aggregate, which in that particular model may not be appropriate?!
Thank you
1) Yes. For example, suppose you have a customer with orders. To get all the orders for a customer, you can select all orders with a corresponding CustomerId via orders repository.
2) To be clear, relationships between root entities (ARs) should be expressed with identities. References between non-root entities in distinct aggregates should be forbidden.
2a) Expressing relationship with an identity reference is different from expressing the relationship with an object reference. The reason that the relationship should be expressed with an identity is to maintain transactional integrity - ie only a single aggregate will be modified in any given transaction. If an object reference was used, two aggregates could be affected by a transaction.
2b) Yes. A repository allows the traversal and the identity expresses the relationship.
2c) This is the whole point of an aggregate - it should define a consistency boundary. If changes to an aggregate should affect aggregates, those changes should be capable of being applied in an eventually consistent manner.
UPDATE
2a1) It creates a possibility of A2 being modified in the transaction which is modifying A1. Referencing A2 by identity eliminates this possibility. Also, things such as database locking won't apply to A2 when loading A1.
2a2) If the transaction which modifies A1 results in changes that need to be applied to A2, those changes will be synchronized eventually. The reason for keeping them physically separate are stated in 2a1.
2a3) You could, but again, reasons specified in 2a1 explain why it is better to use identity references.
UPDATE 2
2) I suppose if the integrity of the two ARs is maintained and the reference makes life a lot easier it can be acceptable. The reference is not itself a problem, it can lead to problems.
2a1) One reason for this is rooted in more modern document database architectures. Document databases usually only support atomic transactions on a single document. Another reason is that two ARs may be stored in different databases, in which case a distributed transaction would be required to maintain consistency. However, a relational database can certainly lock multiple tables and so a scenario where two ARs are modified in a transaction are possible. As a result, these constraints should be understood as guidelines. If all ramifications are understood and caution is used, then it can be acceptable to update two ARs.
Related
When I was reading Microservice Patterns, one of the paragraph says that Domain-Driven Design requires aggregate to follow some rules. One of the rule is "inter-aggregate references must use primary keys".
For example, it basically means that a class Book may only have getOwnerUserId() and shouldn't have getOwnerUser().
However, in Eric Evans's Domain-Driven Design, it clearly says:
Objects within the AGGREGATE can hold references to other AGGREGATE roots.
I guess it means that Book can have getOwnerUser().
If my above understandings of these 2 books are correct, is the book "Microservice Patterns" wrong about aggregates? Or is there some variant of Domain-Driven Design that "Microservice Patterns" is referring to? Or, did I miss something?
Both books are saying roughly the same thing using different words. I'll add mine.
An aggregate can hold a reference to other aggregates in the same bounded context. This reference is through an identifier. In many cases an identifier is a primary key (relational artifact) or a document ID (e.g. from a document database like MongoDB). Regardless, in the domain, it's just an "identifier".
It is also possible for aggregates to refer to aggregates in another bounded context. In this case the reference is not just an identifier, but a projection of the "foreign" aggregate into the current bounded context.
Think of a library system. One bounded context could be the checkout system, and another could be about books themselves. A Library Patron aggregate could have references to books within its aggregate; these references would be small objects containing just a few of the books' properties: ID, title, and author perhaps, but not the number of pages, publisher, location in the library, etc.
"Aggregate root" is essentially the DDD way of saying "primary key" (I suspect the reason for not saying "primary key" is that to do so would be bringing something that's more of an infrastructure concern into the domain).
If User is a separate aggregate from Book, Book can only hold a User's ID (assuming that that's the aggregate root for User), not a User.
Since anything outside of the User class can only access a user by ID, however, it's probably better naming to say getUser() vs. getUserId() and have getUser() return a user ID.
"inter-aggregate references must use primary keys"
"primary key" is very RDBMS-specific so identity would be more appropriate.
"Objects within the AGGREGATE can hold references to other AGGREGATE roots."
Can, but generally shouldn't.
Why reference through identity?
An Aggregate Root (AR) is a strong consistency boundary. The natural way for an AR to protect it's invariants (including from violations through concurrency) is to encapsulate it's data in a way that allows it to oversee/detect every change.
When you reference other ARs by object reference rather than identity the consistency boundary becomes blurry which makes the design much harder to reason about.
Here's a (rather silly) example:
We can see that it's not enough anymore to look at the AR's structure to know what's truly part of it's boundary and surely that could lead to issues.
Furthermore, would you know if persons will get deleted if you delete InviteList or if changes made to persons from within InviteList would get persisted when calling save(inviteList)? You'd have to inspect the persistence mappings (assuming an ORM) and the cascade options to know for sure.
Why have direct references?
I'd say the primary reason to allow a direct reference to another AR would be to be pragmatic about queries that are constructed from domain objects. It's generally harder to query without such relationships (e.g. find all InviteList that have an invitee named "Foo") or construct DTOs that must aggregate data from multiple ARs (e.g. InviteListDto with all the invitee names).
However, that's also one of the many reasons CQRS have become so popular these days. If you bypass the domain model for queries entirely (e.g. plain SQL) then you do not have to make concessions in your domain for querying needs.
References
Here's a sample from the IDDD book by Vaugh Vernon where he talks about that very quote from Evans.
I have some Entities and I am trying to follow Domain Driven Design practices to identify Aggregates. I somehow cant do this because I either break the rule of Entities not being allowed to reference non-root Entities of other Aggregates, or I cant form Aggregates at all.
I have the following Entities: Organisation, JobOffer, Candidate, and JobApplication.
An Organisation creates JobOffers but may only have a limited amount of active JobOffers.
A Candidate creates JobApplications but may only have a limited amount of active JobApplications.
A JobApplication references a JobOffer that it is meant for.
Based on that I have to know how many JobOffers an Organisation has before I can create a new one (enforcing limits), I assume Organisation should be an Root-Entity that owns JobOffers. The same applies to Candidates and JobApplications. Now I have two Aggregates: Organisation with JobOffers and Candidate with JobApplications. But... I need to reference JobOffer from JobApplication... and that breaks the rule that I cant reference non-Root-Entities.
I have looked for and found similar questions on this forum but I somehow still cant figure it out, so sorry in advance - I appreciate any help.
I general, you should avoid holding object references to other aggregates but rather reference other aggregates by id. In some cases it can be valid to reference some entity within in another aggregate, but again this should be done via id as well.
If you go this way you should reference a composite id. Aggregates are meant to depict logical boundaries and also transactional boundaries. Child entity ids which are modelled as part of the aggregate only need to be unique inside the boundaries of that aggregate. This makes it a lot easier to focus on stuff just inside those boundaries when performing actions in your system. Even if you are using UUIDs (or GUIDs), if you really need to reference a child entity of another aggregate - let's say you have good reasons for that - you should model the id graph via the aggregate root which means always knowing the id of the other aggregate in combination with the id of the entity you are interested in. That means referencing a composite id.
But: whenever I think I need to reference a child entity of another aggregate root at first I investigate this more deeply. This would mean that this child entity might be important as a stand-alone entity as well.
Did I miss to discover another aggregate root?
In your case, looking at your domain model diagram, I suspect JobOffer should be an aggregate on its own. Of course I don't know your domain but I can at least guess that there might be some transactions performed in your system allowing to mutate job offers on its own without requiring to consider organization specific business invariants. If this is the case, you should rethink the domain model and consider making JobOffer an aggregate root on its own. In this case your initial problem get's resolved automatically. Also note that modelling job offers as aggregates can make actions performed on organizations simpler as well as you do not need to load all the job offers for that organization when loading the organization aggregate. This might of course not be relevant in your case and really depends on the maximum amount of job offers for an organization.
So I think, depending on your business requirements and domain logic invariants I would recommd one of the folllwing two options:
Reference the foreign child entity only through a composite id including the id of other the aggregate + the child entity id (e.g. by creating some value object that represents this reference as a strong type)
Make JobOffer an aggregate on its own if the mentioned considerations hold true in your case
If we are working on a sub-domain where we're only dealing with a read-only scenario, meaning that our entities and value objects will not be changed, does it make sense to create aggregates composed by roots and its children or should each entity of this context map to a single aggregate?
Imagine that we've entity A and entity B.
In a context where modifications are made, we create an aggregate composed by entity A and entity B, where A is the aggregate root (let's say that B can't live without A and there are some invariants involved).
If we move the same entities to a different context where no modifications are made, does it make sense to keep this aggregate or should we create an aggregate for entity A and a different one for entity B?
In 2019, there's fairly large support for the idea that in a read only scenario, you don't bother with the domain model at all.
Just load the data directly into whatever read only data structure makes sense to support the use case.
See also: cqrs.
The first thing is if B cant live without A and there are some invariants involved, to me A is an Aggregate root, with B being an entity that belongs to it.
Aggregate roots represent a real world concept and dont just exist for the convenience of modification. In many of our applications, we don't modify state of our aggregate roots once created - i.e. we in effect have immutable aggregate roots. These would have some logic for design by contract checks/invariant checks etc but they are in effect anaemic as there is no "Update" methods due to its immutability. Since the "blue book" was written by Eric Evans, alot of things have changed, e.g. the concept of NoSql database have become very popular, functional programming concepts have become very influential rising to more advanced DDD style architectures being recommended such as CQRS. So for example, rather than doing updates to a database I can append (i.e. insert) instead. This leads to aggregates no longer having to be "updated". This leads to leaner anaemic types but this is what we want in this context. The issue before with anaemic types was that "update logic" for a given type was put elsewhere in the codebase instead of being put into the type itself. However if you do not require "update logic" in the first place then you dont have that problem!
If for example there is an Order with many OrderItems, we would create an Order aggregate root and an OrderItem entity. Its a very important concept to distill your domain to properly identify what are aggregates, entities and value types.
Then creation of domain services, repositories etc just flows naturally. For example, aggregate roots and repositories are 1 to 1 i.e. in the example above we would have an Order repository and not have an OrderItem repository. That way your main domain concepts are spread throughout your code in a predictable and easy to understand way.
Finally, in your specific question I would not treat them as the same entities. In one context, you seem to need modification logic - in the other they you dont - they are separate domain concepts to me.
In context where modifications are made: A=agg root, B=entity.
In context without modifications: A=agg root (immutable), B=entity(immutable)
I just started on DDD and encounter the term aggregate roots.
My current understanding is that this is kind of a parent entity that hold reference to other complementary entity. Example : aggregate roots will be Employee that also contain position, shift, gender, and salary.
My first question will be whether this understanding is correct ?
Secondly, I get an impression that repository is defined only for each aggregate. Yet, it puzzles me how we could retrieve information regarding other entity (Ex: list of positions or shift type) ?
Thank you,
Aggregates are consistency boundaries to enforce invariants. This means that the entities and objects inside the aggregate must remain consistent together with regards to the business rules.
http://dddcommunity.org/library/vernon_2011/
http://martinfowler.com/bliki/DDD_Aggregate.html
https://lostechies.com/gabrielschenker/2015/05/25/ddd-the-aggregate/
Secondly, I get an impression that repository is defined only for each aggregate. Yet, it is puzzle me how we could retrieve information regarding other entity (Ex : list of positions or shift type) ?
You can have a separate read model over your data if you choose to do so and it makes sense that the business wants to view the data in a different way. The consistencies you need to enforce when you are writing data do not apply on the read side. CQRS is the pattern to help with this - you separate your write side from your read side.
https://lostechies.com/gabrielschenker/2015/04/07/cqrs-revisited
Quotes are from DDD: Tackling Complexity in the Heart of Software ( pg. 150 )
a)
global search access to a VALUE is often meaningles, because finding a
VALUE by its properties would be equivalent to creating a new instance
with those properties. There are exceptions. For example, when I am
planning travel online, I sometimes save a few prospective itineraries
and return later to select one to book. Those itineraries are VALUES
(if there were two made up of the same flights, I would not care which
was which), but they have been associated with my user name and
retrieved for me intact.
I don't understand author's reasoning as for why it would be more appropriate to make Itinierary Value Object globally accessible instead of clients having to globally search for Customer root entity and then traverse from it to this Itinierary object?
b)
A subset of persistent objects must be globaly accessible through a
search based on object attributes ... They are usualy ENTITIES,
sometimes VALUE OBJECTS with complex internal structure ...
Why is it more common for Values Objects with complex internal structure to be globally accesible rather than simpler Value Objects?
c) Anyways, are there some general guidelines on how to determine whether a particular Value Object should be made globally accessible?
UPDATE:
a)
There is no domain reason to make an itinerary traverse-able through
the customer entity. Why load the customer entity if it isn't needed
for any behavior? Queries are usually best handled without
complicating the behavioral domain.
I'm probably wrong about this, but isn't it common that when user ( Ie Customer root entity ) logs in, domain model retrieves user's Customer Aggregate?
And if users have an option to book flights, then it would also be common for them to check from time to time the Itineraries ( though English isn't my first language so the term Itinerary may actually mean something a bit different than I think it means ) they have selected or booked.
And since Customer Aggregate is already retrieved from the DB, why issue another global search for Itinerary ( which will probably search for it in DB ) when it was already retrieved together with Customer Aggregate?
c)
The rule is quite simple IMO - if there is a need for it. It doesn't
depend on the structure of the VO itself but on whether an instance of
a particular VO is needed for a use case.
But this VO instance has to be related to some entity ( ie Itinerary is related to particular Customer ), else as the author pointed out, instead of searching for VO by its properties, we could simply create a new VO instance with those properties?
SECOND UPDATE:
a) From your link:
Another method for expressing relationships is with a repository.
When relationship is expressed via repository, do you implement a SalesOrder.LineItems property ( which I doubt, since you advise against entities calling repositories directly ), which in turns calls a repository, or do you implement something like SalesOrder.MyLineItems(IOrderRepository repo)? If the latter, then I assume there is no need for SalesOrder.LineItems property?
b)
The important thing to remember is that aggregates aren't meant to be
used for displaying data.
True that domain model doesn't care what upper layers will do with the data, but if not using DTO's between Application and UI layers, then I'd assume UI will extract the data to display from an aggregate ( assuming we sent to UI whole aggregate and not just some entity residing within it )?
Thank you
a) There is no domain reason to make an itinerary traverse-able through the customer entity. Why load the customer entity if it isn't needed for any behavior? Queries are usually best handled without complicating the behavioral domain.
b) I assume that his reasoning is that complex value objects are those that you want to query since you can't easily recreate them. This issue and all query related issues can be addressed with the read-model pattern.
c) The rule is quite simple IMO - if there is a need for it. It doesn't depend on the structure of the VO itself but on whether an instance of a particular VO is needed for a use case.
UPDATE
a) It is unlikely that a customer aggregate would have references to the customer's itineraries. The reason is that I don't see how an itinerary would be related to behaviors that would exist in the customer aggregate. It is also unnecessary to load the customer aggregate at all if all that is needed is some data to display. However, if you do load the aggregate and it does contain reference data that you need you may as well display it. The important thing to remember is that aggregates aren't meant to be used for displaying data.
c) The relationship between customer and itinerary could be expressed by a shared ID - each itinerary would have a customerId. This would allow lookup as required. However, just because these two things are related it does not mean that you need to traverse customer to get to the related entities or value objects for viewing purposes. More generally, associations can be implemented either as direct references or via repository search. There are trade-offs either way.
UPDATE 2
a) If implemented with a repository, there is no LineItems property - no direct references. Instead, to obtain a list of line items a repository is called.
b) Or you can create a DTO-like object, a read-model, which would be returned directly from the repository. The repository can in turn execute a simple SQL query to get all required data. This allows you to get to data that isn't part of the aggregate but is related. If an aggregate does have all the data needed for a view, then use that aggregate. But as soon as you have a need for more data that doesn't concern the aggregate, switch to a read-model.