Definition of a set of aggregates? - domain-driven-design

Definition of a set of aggregates? - domain-driven-design

I'm struggling with how/if to define "a set of aggregates". Aggregates are supposed to be stand alone and isolated but it's easy to think of a bigger set of aggregates that belong together. But is this a trap?
Using this "set of aggregates" it would be possible to for instance enumerate and index aggregates on a unique property within the set and have other domain rules that could be validated across all aggregates in the set. It's tempting but also feels a bit wrong.
Another approach would be to avoid this thinking completely and not allow/define a set of aggregates and not allow enumerating aggregates but only load/save on aggregate-id. Using this option if would be necessary to reference aggregates from other aggregates and by doing this build up an interconnected graph of aggregates.
The approaches are similar to having aggregates in a folder on disk or having an "internet" of aggregates where the references between them are defining the bigger set of aggregates. In any case I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much? I'm not sure I explain this very good but my question is if there are any definitions of the "set of aggregates" or if we should think of aggregates as totally isolated/on its own and with only a unique aggregate-id (UUID)?
The set of aggregates could for instance be the database being used under the surface. But what I'm wondering is if this database as in the information about what aggregates it contains has any definition in DDD or if we should think about a set of aggregates as an interconnected graph where only traversal of this graph can be used to enumerate all "associated" aggregates.

Aggregates are connected
In any application with sufficient complexity, Aggregates end up referencing one-another. And it is perfectly reasonable to use their unique identifiers as reference IDs to refer to each other.
But take care to load and persist aggregates outside the domain layer, typically in repositories. If you want to traverse links across aggregates and load them into memory, you will be doing that upfront before handing over control to the domain layer for the actual processing.
Traversing the graph to get all related aggregates is correct, but this rarely spans across too many aggregate boundaries. You rarely find a single change or rule to be applied throughout the application. If you do have such a transaction, it is probably a sign of the domain design needing improvement, simply because you are spreading one responsibility/change amongst many aggregates.
The connectivity is so usual that you should watch out for aggregates that have no linkages with the rest of the system. They are either standalone libraries, or they probably belong to a different bounded context.
Aggregates can morph into different forms
They are aggregates because they form a clear invariant boundary, with their primary responsibility being to enforce invariants across state changes for all the entities within themselves. But they can morph into different kinds of DDD objects based on the requirement.
A good example is of a single Currency note. In most applications, they are value objects. But for the federal bank, they are aggregates with clear cut invariant rules. They are aggregates when they are created and referenced, but in a transaction that ships printed notes to banks, they may become value objects.
So you may have to evaluate whether you are talking about a domain entity in its aggregate form, or as a value object when you consider each linkage.
Aggregates are invariant boundaries
It is wrong to validate domain rules across aggregates.
Your aggregate boundary is an invariant boundary, meaning all the domain rules within it should be satisfied at all time. By that logic, you are going to incorrectly build up a structure that will need to ensure that all domain rules across aggregates are valid at all time. Doing so will impose considerable performance burden, not to mention the complexity in business logic.
But this is not to say that there may be domain rules that span across aggregates. The correct way to accomplish this would be using eventual consistency and an Event-driven approach.
The primary changing aggregate would validate and persist the data, and bubble up an event containing the state change. Other aggregates would then act on the event and bring themselves up-to-date. If an aggregate's domain rules break because of the change, there is usually a supplementary mechanism that allows correction of the problem (a preferred mechanism) or a rollback of the first state change (happens very rarely).

Perhaps you can find a common denominator the agg sets have in common and use that to work with?
A simplified example; there is a set of Books and a set of Users that have nothing in common except you want to know whenever they were first registered? What might be an option is to have an interface FirstRegistration and then you can choose to either expand Books/Users or create a specific entity instead.

I'm struggling with how/if to define "a set of aggregates". Aggregates
are supposed to be stand alone and isolated but it's easy to think of
a bigger set of aggregates that belong together. But is this a trap?
I think you're struggling because indeed the idea of a set of aggregates (instances) is very generic, and the uses of such things are contextual and domain-specific. People don't talk specifically about it because of course you may have behaviors that operate on a collection of multiple aggregates, but that doesn't give such collections any particular common properties or requirements that would allow you, from a general DDD perspective, to characterize such collections more specifically than "a set of aggregates", "a list of distinct aggregates", or similar.
Using this "set of aggregates" it would be possible to for instance
enumerate and index aggregates on a unique property within the set and
have other domain rules that could be validated across all aggregates
in the set. It's tempting but also feels a bit wrong.
Tempting why? You've couched the question in very abstract terms, so it's pretty much impossible to contradict you about the "it would be possible", but just because something may be possible doesn't mean it would be useful. In practice, I think you'll find that rules or behaviors that operate on collections of aggregates most naturally belong not to collections of aggregates in an abstract sense, but rather to other aggregate types in your domain model, to domain repositories, or to domain services.
It is entirely plausible that your domain model might want to handle particular sets of aggregates characterized by some rule. For example, if you're an airline, then one of the aggregates in your domain model might a single seat on a flight, since that's the unit you sell. It makes sense in that case that there would be operations on all the seats on a particular flight, for example, but whatever rules and behaviors you might have about that are specifically about that kind of aggregate, selected in that particular way.
Another approach would be to avoid this thinking completely and not
allow/define a set of aggregates and not allow enumerating aggregates
but only load/save on aggregate-id.
It's surely counterproductive to forbid working with sets of aggregates. Just don't attribute more significance to it than is warranted. There is nothing particularly special about sets of aggregates in general.
Using this option if would be
necessary to reference aggregates from other aggregates and by doing
this build up an interconnected graph of aggregates.
I don't follow that. One certainly must be able to retrieve and store individual aggregates from persistence, as that's more or less the defining property of aggregates -- they are the unit of persistence. But that doesn't mean that you must reject the ability to work with collections of aggregates. However, sets of aggregates do not have identity in the same way that individual aggregates do, so yes, relationships between aggregates need to be modeled in terms of individual aggregates. Nevertheless, that does not inherently preclude 1:m or n:m relationships among aggregates.
I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much?
You'll find all sorts of uses of various sets of aggregates in applications built and maintained based on DDD ideas, but there's not much to talk about at the level of abstraction of your question, and what there is is already summed up in the words "set" and "aggregate".
The set of aggregates could for instance be the database being used
under the surface. But what I'm wondering is if this database as in
the information about what aggregates it contains has any definition
in DDD
Not to my knowledge. I suspect most DDD practitioners would just call it "the data", or something similar.
or if we should think about a set of aggregates as an
interconnected graph where only traversal of this graph can be used to
enumerate all "associated" aggregates.
I'm still not seeing why you set that up as a thing. Sure, depending on the domain model, you might be able to traverse all or substantial chunks of the data by traversing associations between aggregates, and that might be appropriate for some purposes, but DDD doesn't have to give a special name or special rules for sets of aggregates for you to work with them.
Like any useful methodology, DDD exists to solve problems. Its bread & butter is complex applications with complex data and evolving requirements. It is not to be interpreted as a straight jacket preventing designers and developers from (thoughtfully) writing designs and code that incorporate aspects of other design approaches, much less designs and code that provide for the application's idiosyncratic needs.

Related

Modelling aggregates with subtypes (DDD/CQRS)

I have a subdomain which involves tracking user financial data across different financial account types.
For example, users can input data for their:
bank accounts,
credit cards,
loans,
lines of credit,
real estate,
and more...
Now within each individual type, there are more subtypes.
For instance, under loans:
personal loans,
business loans,
mortgages,
car loans,
and more...
They would each have their own particular invariants, with some unique properties and functionality, and some shared properties and functionality.
I've been approaching this using composition, creating an aggregate for each subtype, and using interfaces and helper interface implementations to share similar logic between aggregates.
However, it appears as though I'm going to end up with dozens of different aggregates when modelling all these different account types. This doesn't feel right.
Alternatives I've considered:
have a type property on the loan aggregate, and conditional logic based off the type.
create different bounded contexts for each of these types: This feels like overkill, I believe this is all part of the same business subdomain.
create aggregates based off shared functionality - eg SecuredLoan and UnsecuredLoan aggregates
creating subclasses in the general aggregates to hold the subtype's unique functionality. get some encapsulation of subtype specific logic, with some conditional logic still (eg conditional properties on the aggregate). Not really sure the difference between this and just creating a separate aggregate for each subtype
Tradeoffs seem to be, the more general the implementation, there will end up being a ton of conditional logic, and conditional properties based off the subtype.
Versus building specific aggregates for each subtype, the logic per aggregate is simplified, but there ends up being hundreds of commands in the application layer, a lot of them which are basically the same thing but to a different subtype. Additionally, there end up being dozens of repositories.
It feels like I either get an explosion of conditional logic complexity in a general aggregate, or an explosion of the number of aggregates (or contexts) if building one per subtype.
Question - is there a known pattern for dealing with this type of modelling problem? Or is it really just dealing with the above tradeoffs, and finding something which fits best? In that case, is there some precedent I can apply to the decision-making process, as I'm struggling to decide between the above approaches. And is it problematic if there end up being many dozens of aggregates within a given context?

Rather than starting with the data at rest to get your aggregates, consider instead what operations/changes ("commands" one might say) will be performed and how the results of those operations affect future operations is what leads to what the aggregates want to be. Event storming style approaches can be helpful for figuring out these relationships between state changes.
For instance, each of these kinds of loans might have AccrueInterest, DrawPrincipal, and RecordPayment commands which operate on the balances identically (given perhaps configurable rate parameters etc.) and which don't affect and aren't affected by other commands. In that scenario, you can have a Loan aggregate which models the idea that there's a loan with interest and principal balances on which interest accrues and payments are made. An AutoLoan aggregate might then just be managing the collateralization of Loan ABC123 with VIN 1G1234567890.

Sorry for the simple starting answer, but …
Build your aggregates based on your actual use cases.
Aggregate A - Scenario A
Aggregate B - Scenario B
…
Avoid building aggregates for general conditions, DDD and ubiquitous language is about developing language, aggregates and systems around the use case not general purpose.
General purpose has its use cases and isn’t necessary anti DDD; but the focus is on creating the specifics then abstracting to generality.

DDD: creating multiple aggregates with a shared life-cycle in a single transaction

I'm aware of the general rule that only a single aggregate should be modified per transaction, mostly for concurrency and transactional consistency issues, as far as I'm aware.
I have a use case where I want to create multiple aggregates in a single transaction: a RestaurantManager, a Restaurant, and a Menu. They seem like a single aggregate because their life-cycles begin and end together: it doesn't make sense within the domain to create a RestaurantManager without a Restaurant, or vice versa; the same goes for a Restaurant and a Menu. Further, if the Restaurant or the RestaurantManager is deleted (unregistered), they should all be deleted together.
However, I've split them into separate aggregates because, once created, they are updated separately, maintain their own invariants, and I don't want to load them all into memory just to update one property on the Restaurant, for example.
The only thing that ties them together is their life-cycle.
My question is whether this represents a case where it is okay to go against the "rule" that each transaction should only operate on a single aggregate.
I'd also like to know if I should enforce their shared life-cycle in the domain model by having each aggregate root hold the identifier of the aggregate root it depends on, i.e. by having Restaurant require a MenuId as a constructor parameter, and likewise for Menu and RestaurantId, so that neither can be created without the other. However, this still wouldn't enforce that they should be saved together by the application service anyway, since it could create them all in memory, then only save the Menu, for example.

Your requirement is a pretty normal use case in DDD, IMHO. There are always multiple aggregates working in tandem to support the application, and they are interlinked in their lifecycles. But the modeling concepts still stand true. Let me attempt to explain what your model would look like with the help of a few DDD rules:
Aggregates are transaction boundaries
Aggregates ensure that no business invariants are broken at any point. This means that if you have multiple aggregates strung together as part of one transaction, you have to load all of them into memory for the validation.
This is especially a problem when your application is data-rich and stores data in a database cluster - partitioned, distributed (think Mongo or Elasticsearch). You will have the problem of loaded up data from potentially different clusters as part of a single transaction.
Aggregates are loaded in entirety
Aggregates and their associated data objects are loaded in entirety into memory. This means that unnecessary objects (say the restaurant's schedule for the upcoming month, for example) for the transaction may be loaded into memory. By itself, this is not a problem. But when multiple aggregates get together, the amount of data loaded into memory needs to be considered.
Aggregates refer to each other by their unique identifiers
This one is straightforward and means that each aggregate stores its referenced aggregates by their identifiers instead of enclosing the other aggregate's data within it.
State changes across Aggregates are handled through Domain Events
In cases where you want a state change in one aggregate to have side-effects on other aggregates, you publish a domain event, and a subscriber handles the change on other aggregates in the background. This is how you would want to handle your requirement for cascade deletes.
By following these rules, you are essentially zooming in one single aggregate at a time and ensuring that the complexity remains low. When you string up multiple aggregates, though it is clear and understandable on day 1, eventually, the application tends towards becoming a big ball of mud, as dependencies and invariants start crisscrossing each other.

"only a single aggregate should be modified per transaction"
Contention at creation doesn't matter as much. You can create many ARs in a single transaction without problem because the only other operation that could conflict is another duplicate creation process.
Another reason to avoid involving many ARs in a single transaction is coupling between modules though, but you could always keep things loosely coupled using synchronously dispatched domain events.
As for the deletion, it's probably less problematic to make it eventually consistent. Does it really matter that Restaurant is closed while RestaurantManager remains registered for a short period of time?
The fact you are asking this question tells me your system is not distributed? If your system is running with a single DB server and used by a few people it may be that eventual consistency make things more complex for scalability you don't actually need.
Start simple and refactor as needed, but crossing AR boundaries is not something that should be done consistently or else your boundaries are clearly wrong.
Furthermore, if you want to communicate that a RestaurantManager can't be spawned from nowhere and associated with an invalid RestaurantId by mistake you may want to look at your ubiquitous language for guidance.
e.g.
"A RestaurantManager is registered for a given Restaurant": not sure it truly aligns with your UL, but it's just for the sake of the example.
RestaurantManager manager = restaurant.registerManager(...);
This obviously increases coupling and could affect performance, but it aligns well with the UL and makes it more difficult to misuse the model. Also note that with a single DB, you could enforce referential integrity which takes cares of these uninteresting referential constraints.

As pointed out by #plalx, contention doesn't matter as much when creating aggregates in terms of transactions, since they don't yet exist so can't be involved in contention.
As for enforcing the mutual life cycle of multiple aggregates in the domain, I've come to think that this is the responsibility of the application layer (i.e. an application service, or use case).
Maybe my thinking is closer to Clean or Hexagonal architecture, but I don't think it's possible or even sensible to try and push every single business rule down into the "domain model". The point of the domain model for me is to partition the problem domain into small chunks (aggregates), which encapsulate common business data/operations that change together, but it's the application layer's responsibility to use these aggregates properly in order to achieve the business' end goal (which is the application as a whole), including mediating operations between the aggregates and controlling their life cycles.
As such, I think this stuff belongs in an application service. That being said, frequently updating multiple aggregates in each use case could be a sign of incorrect domain boundaries.

DDD - Aggregates for read-only

If we are working on a sub-domain where we're only dealing with a read-only scenario, meaning that our entities and value objects will not be changed, does it make sense to create aggregates composed by roots and its children or should each entity of this context map to a single aggregate?
Imagine that we've entity A and entity B.
In a context where modifications are made, we create an aggregate composed by entity A and entity B, where A is the aggregate root (let's say that B can't live without A and there are some invariants involved).
If we move the same entities to a different context where no modifications are made, does it make sense to keep this aggregate or should we create an aggregate for entity A and a different one for entity B?

In 2019, there's fairly large support for the idea that in a read only scenario, you don't bother with the domain model at all.
Just load the data directly into whatever read only data structure makes sense to support the use case.
See also: cqrs.

The first thing is if B cant live without A and there are some invariants involved, to me A is an Aggregate root, with B being an entity that belongs to it.
Aggregate roots represent a real world concept and dont just exist for the convenience of modification. In many of our applications, we don't modify state of our aggregate roots once created - i.e. we in effect have immutable aggregate roots. These would have some logic for design by contract checks/invariant checks etc but they are in effect anaemic as there is no "Update" methods due to its immutability. Since the "blue book" was written by Eric Evans, alot of things have changed, e.g. the concept of NoSql database have become very popular, functional programming concepts have become very influential rising to more advanced DDD style architectures being recommended such as CQRS. So for example, rather than doing updates to a database I can append (i.e. insert) instead. This leads to aggregates no longer having to be "updated". This leads to leaner anaemic types but this is what we want in this context. The issue before with anaemic types was that "update logic" for a given type was put elsewhere in the codebase instead of being put into the type itself. However if you do not require "update logic" in the first place then you dont have that problem!
If for example there is an Order with many OrderItems, we would create an Order aggregate root and an OrderItem entity. Its a very important concept to distill your domain to properly identify what are aggregates, entities and value types.
Then creation of domain services, repositories etc just flows naturally. For example, aggregate roots and repositories are 1 to 1 i.e. in the example above we would have an Order repository and not have an OrderItem repository. That way your main domain concepts are spread throughout your code in a predictable and easy to understand way.
Finally, in your specific question I would not treat them as the same entities. In one context, you seem to need modification logic - in the other they you dont - they are separate domain concepts to me.
In context where modifications are made: A=agg root, B=entity.
In context without modifications: A=agg root (immutable), B=entity(immutable)

Association between aggregates, how to decide between holding reference to the object or only to its identity

For exemple, giving a performance having multiple performers...
First option:
Performance (1) ---> (*) Performer
Second option:
Performance
+PerformerIds[]
1st option Pros:
Easier access for query purpose (lets say I don't want to use CQRS)
When we look at the domain model it seems easier to understand, the relation between Performance and Performer is more visible
1st option Cons:
A Performance object is heavier to load (could possibly be fixed with lazy loading)
More coupling
2nd option pros and cons are obviously the opposite of the first option, harder access to performers from the performance, model diagram harder to understand, lighter to load and less coupling.
I kind of like the first option, because, there is no way a Performance object will ever use the Performer object. That relation is more like a data relation / query model.
But it also makes the domain model diagram less clear, in my opinion, so i'm not sure if I should which solution to use.
Could my problem here be that I'm trying to use the same class diagrams for domain experts and for developers ? and/or modeling for query primarily rather than for updating ?

how to decide between holding reference to the object or only to its identity
Holding a reference to an identity of related Aggregate Roots (ARs) makes their boundaries explicit.
Sure each AR still holds references to all its Entities but it becomes very explicit in your domain model whether you reference Aggregate Roots or Entities.
If you hold references to related Aggregate Roots (ARs) it's very easy to cross boundaries between them.
It can be very easy to change few aggregates at the same time, especially if you use ORM or have Unit of Work implemented. So your aggregate roots are not transaction boundaries any more.
If you hold only identifiers to related Aggregate Roots (ARs), in order to access a related AR you have to load it from repository first. It's a trade off. It becomes very explicit when you cross a boundary of one aggregate and query another one.
What I don't like of it is that when you look at a class diagram, all you see is separate aggregates that does not have any association between them, that does not look very useful to show the domain concepts that are related.
Your domain model is a "model" and it's up to you to make design decisions.
If all your Aggregate Roots are small, and you use an ORM that does a lot of magic for free (NOTE: There is always a price), and a value of seeing references on a class diagramme is bigger than value of seeing boundaries then try with holding a reference to related ARs, even if your code doesn't really need it.
And then evaluate your model over a time.

Am I allowed to have "incomplete" aggregates in DDD?

DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?

I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.

Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern

You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.

You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.

Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string