DDD performance issue eager loading AR child entities

DDD performance issue eager loading AR child entities - domain-driven-design

In the last days i'm making a sample application to apply/study DDD.
One of the principles of DDD(Please correct me if I'm wrong) is that all the changes to an entity should be made through the Aggregate Root(AR) and an AR should be loaded with his child entities. In this way is eaiser to validate Aggregate consistency.
There is only a little big detail that is bothering me. I'm not able to understand how DDD is dealing with performance issues. Imagine that i have an Order(AR) that have, let say, 20000, 30000 of OrderLine. Performance issues would exist eager loading a lot of child records. Saying Order as AR you can imagine another scenarios where this can happen.
I'm looking forward to read you opinion about this subject.

DDD isn't always free from technical considerations. If you have an AR that can contain a very large number of child entities, consider if you can make the child entities ARs in their own right. This decision has to be made while taking eventual consistency into account.
In your provided example, consider whether the Order AR really needs to reference OrderLine entities in the first place in order to maintain integrity. If it does, consider making OrderLine an AR on its own, in which case you may have to deal with eventual consistency. Of course, if you make OrderLine an AR, you application logic will change, because an operations that need to be performed on the OrderLine will have to go through the OrderLineRepository to access the OrderLine instead of going through the Order AR.
For more on this, check out Effective Aggregate Design by Vaughn Vernon.

Related

Multiple aggregates handle with one table?

we are modeling an order system and we have the Order concept. The Order has a life cycle from it is created to it is delivered and between them the order can be in other states. Some states have particular business logic, and sometimes share other business logic such as when an order can be expire in a concrete date if it has not finished on time.
Well, the team is doubting if
Use the state pattern (one aggregate, one repository), or
Use one aggregate/repository for handle each state of the order.
Within of the second approach, we are considering to use the same table for each repository, to have a table order to persist/load each aggregate. It is well seen from DDD perspective?
What do you think about?

In general, DDD is all about not polluting the domain with infrastructure concerns; whether different aggregates are stored in the same table is an infrastructure concern. As long as the repository/repositories are able to meet their obligations, go for it.
That said, having an Order have a lot of variation in terms of what operations are legal and what information is available from state to state might be a sign that the states might make sense being apportioned to different bounded contexts (e.g. a context where items are added to an order (e.g. a cart context), a checkout/payment context, an assembly for delivery context, and a being delivered context).

Definition of a set of aggregates?

I'm struggling with how/if to define "a set of aggregates". Aggregates are supposed to be stand alone and isolated but it's easy to think of a bigger set of aggregates that belong together. But is this a trap?
Using this "set of aggregates" it would be possible to for instance enumerate and index aggregates on a unique property within the set and have other domain rules that could be validated across all aggregates in the set. It's tempting but also feels a bit wrong.
Another approach would be to avoid this thinking completely and not allow/define a set of aggregates and not allow enumerating aggregates but only load/save on aggregate-id. Using this option if would be necessary to reference aggregates from other aggregates and by doing this build up an interconnected graph of aggregates.
The approaches are similar to having aggregates in a folder on disk or having an "internet" of aggregates where the references between them are defining the bigger set of aggregates. In any case I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much? I'm not sure I explain this very good but my question is if there are any definitions of the "set of aggregates" or if we should think of aggregates as totally isolated/on its own and with only a unique aggregate-id (UUID)?
The set of aggregates could for instance be the database being used under the surface. But what I'm wondering is if this database as in the information about what aggregates it contains has any definition in DDD or if we should think about a set of aggregates as an interconnected graph where only traversal of this graph can be used to enumerate all "associated" aggregates.

Aggregates are connected
In any application with sufficient complexity, Aggregates end up referencing one-another. And it is perfectly reasonable to use their unique identifiers as reference IDs to refer to each other.
But take care to load and persist aggregates outside the domain layer, typically in repositories. If you want to traverse links across aggregates and load them into memory, you will be doing that upfront before handing over control to the domain layer for the actual processing.
Traversing the graph to get all related aggregates is correct, but this rarely spans across too many aggregate boundaries. You rarely find a single change or rule to be applied throughout the application. If you do have such a transaction, it is probably a sign of the domain design needing improvement, simply because you are spreading one responsibility/change amongst many aggregates.
The connectivity is so usual that you should watch out for aggregates that have no linkages with the rest of the system. They are either standalone libraries, or they probably belong to a different bounded context.
Aggregates can morph into different forms
They are aggregates because they form a clear invariant boundary, with their primary responsibility being to enforce invariants across state changes for all the entities within themselves. But they can morph into different kinds of DDD objects based on the requirement.
A good example is of a single Currency note. In most applications, they are value objects. But for the federal bank, they are aggregates with clear cut invariant rules. They are aggregates when they are created and referenced, but in a transaction that ships printed notes to banks, they may become value objects.
So you may have to evaluate whether you are talking about a domain entity in its aggregate form, or as a value object when you consider each linkage.
Aggregates are invariant boundaries
It is wrong to validate domain rules across aggregates.
Your aggregate boundary is an invariant boundary, meaning all the domain rules within it should be satisfied at all time. By that logic, you are going to incorrectly build up a structure that will need to ensure that all domain rules across aggregates are valid at all time. Doing so will impose considerable performance burden, not to mention the complexity in business logic.
But this is not to say that there may be domain rules that span across aggregates. The correct way to accomplish this would be using eventual consistency and an Event-driven approach.
The primary changing aggregate would validate and persist the data, and bubble up an event containing the state change. Other aggregates would then act on the event and bring themselves up-to-date. If an aggregate's domain rules break because of the change, there is usually a supplementary mechanism that allows correction of the problem (a preferred mechanism) or a rollback of the first state change (happens very rarely).

Perhaps you can find a common denominator the agg sets have in common and use that to work with?
A simplified example; there is a set of Books and a set of Users that have nothing in common except you want to know whenever they were first registered? What might be an option is to have an interface FirstRegistration and then you can choose to either expand Books/Users or create a specific entity instead.

I'm struggling with how/if to define "a set of aggregates". Aggregates
are supposed to be stand alone and isolated but it's easy to think of
a bigger set of aggregates that belong together. But is this a trap?
I think you're struggling because indeed the idea of a set of aggregates (instances) is very generic, and the uses of such things are contextual and domain-specific. People don't talk specifically about it because of course you may have behaviors that operate on a collection of multiple aggregates, but that doesn't give such collections any particular common properties or requirements that would allow you, from a general DDD perspective, to characterize such collections more specifically than "a set of aggregates", "a list of distinct aggregates", or similar.
Using this "set of aggregates" it would be possible to for instance
enumerate and index aggregates on a unique property within the set and
have other domain rules that could be validated across all aggregates
in the set. It's tempting but also feels a bit wrong.
Tempting why? You've couched the question in very abstract terms, so it's pretty much impossible to contradict you about the "it would be possible", but just because something may be possible doesn't mean it would be useful. In practice, I think you'll find that rules or behaviors that operate on collections of aggregates most naturally belong not to collections of aggregates in an abstract sense, but rather to other aggregate types in your domain model, to domain repositories, or to domain services.
It is entirely plausible that your domain model might want to handle particular sets of aggregates characterized by some rule. For example, if you're an airline, then one of the aggregates in your domain model might a single seat on a flight, since that's the unit you sell. It makes sense in that case that there would be operations on all the seats on a particular flight, for example, but whatever rules and behaviors you might have about that are specifically about that kind of aggregate, selected in that particular way.
Another approach would be to avoid this thinking completely and not
allow/define a set of aggregates and not allow enumerating aggregates
but only load/save on aggregate-id.
It's surely counterproductive to forbid working with sets of aggregates. Just don't attribute more significance to it than is warranted. There is nothing particularly special about sets of aggregates in general.
Using this option if would be
necessary to reference aggregates from other aggregates and by doing
this build up an interconnected graph of aggregates.
I don't follow that. One certainly must be able to retrieve and store individual aggregates from persistence, as that's more or less the defining property of aggregates -- they are the unit of persistence. But that doesn't mean that you must reject the ability to work with collections of aggregates. However, sets of aggregates do not have identity in the same way that individual aggregates do, so yes, relationships between aggregates need to be modeled in terms of individual aggregates. Nevertheless, that does not inherently preclude 1:m or n:m relationships among aggregates.
I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much?
You'll find all sorts of uses of various sets of aggregates in applications built and maintained based on DDD ideas, but there's not much to talk about at the level of abstraction of your question, and what there is is already summed up in the words "set" and "aggregate".
The set of aggregates could for instance be the database being used
under the surface. But what I'm wondering is if this database as in
the information about what aggregates it contains has any definition
in DDD
Not to my knowledge. I suspect most DDD practitioners would just call it "the data", or something similar.
or if we should think about a set of aggregates as an
interconnected graph where only traversal of this graph can be used to
enumerate all "associated" aggregates.
I'm still not seeing why you set that up as a thing. Sure, depending on the domain model, you might be able to traverse all or substantial chunks of the data by traversing associations between aggregates, and that might be appropriate for some purposes, but DDD doesn't have to give a special name or special rules for sets of aggregates for you to work with them.
Like any useful methodology, DDD exists to solve problems. Its bread & butter is complex applications with complex data and evolving requirements. It is not to be interpreted as a straight jacket preventing designers and developers from (thoughtfully) writing designs and code that incorporate aspects of other design approaches, much less designs and code that provide for the application's idiosyncratic needs.

DDD: do I really need to load all objects in an aggregate? (Performance concerns)

In DDD, a repository loads an entire aggregate - we either load all of it or none of it. This also means that should avoid lazy loading.
My concern is performance-wise. What if this results in loading into memory thousands of objects? For example, an aggregate for Customer comes back with ten thousand Orders.
In this sort of cases, could it mean that I need to redesign and re-think my aggregates? Does DDD offer suggestions regarding this issue?

Take a look at this Effective Aggregate Design series of three articles from Vernon. I found them quite useful to understand when and how you can design smaller aggregates rather than a large-cluster aggregate.
EDIT
I would like to give a couple of examples to improve my previous answer, feel free to share your thoughts about them.
First, a quick definition about an Aggregate (took from Patterns, Principles and Practices of Domain Driven Design book by Scott Millet)
Entities and Value Objects collaborate to form complex relationships that meet invariants within the domain model. When dealing with large interconnected associations of objects, it is often difficult to ensure consistency and concurrency when performing actions against domain objects. Domain-Driven Design has the Aggregate pattern to ensure consistency and to define transactional concurrency boundaries for object graphs. Large models are split by invariants and grouped into aggregates of entities and value objects that are treated as conceptual whole.
Let's go with an example to see the definition in practice.
Simple Example
The first example shows how defining an Aggregate Root helps to ensure consistency when performing actions against domain objects.
Given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If we say that Bid is also a separated Aggregate Root you would have have a BidsRepository, and you could easily do:
var newBid = new Bid(money);
BidsRepository->save(auctionId, newBid);
And you were saving a Bid without passing the defined business rule. However, having the Auction as the only Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
auction.placeBid(newBid);
auctionRepository.save(auction);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid.
Here it is pretty clear that the state of a Bid depends on the state of an Auction.
Complex Example
Back to your example of Orders being associated to a Customer, looks like there are not invariants that make us define a huge aggregate consisting of a Customer and all her Orders, we can just keep the relation between both entities thru an identifier reference. By doing this, we avoid loading all the Orders when fetching a Customer as well as we mitigate concurrency problems.
But, say that now business defines the next invariant:
We want to provide Customers with a pocket so they can charge it with money to buy products. Therefore, if a Customer now wants to buy a product, it needs to have enough money to do it.
Said so, pocket is a VO inside the Customer Aggregate Root. It seems now that having two separated Aggregate Roots, one for Customer and another one for Order is not the best to satisfy the new invariant because we could save a new order without checking the rule. Looks like we are forced to consider Customer as the root. That is going to affect our performance, scalaibility and concurrency issues, etc.
Solution? Eventual Consistency. What if we allow the customer to buy the product? that is, having an Aggregate Root for Orders so we create the order and save it:
var newOrder = new Order(customerId, ...);
orderRepository.save(newOrder);
we publish an event when the order is created and then we check asynchronously if the customer has enough funds:
class OrderWasCreatedListener:
var customer = customerRepository.findOfId(event.customerId);
var order = orderRepository.findOfId(event.orderId);
customer.placeOrder(order); //Check business rules
customerRepository.save(customer);
If everything was good, we have satisfied our invariants while keeping our design as we wanted at the beginning modifying just one Aggregate Root per request. Otherwise, we will send an email to the customer telling her about the insufficient funds issue. We can take advance of it by adding to the email alternatives options she can purchase with her current budget as well as encourage her to charge the pocket.
Take into account that the UI can help us to avoid having customers paying without enough money, but we cannot blindly trust on the UI.
Hope you find both examples useful, and let me know if you find better solutions for the exposed scenarios :-)

In this sort of cases, could it mean that I need to redesign and re-think my aggregates?
Almost certainly.
The driver for aggregate design isn't structure, but behavior. We don't care that "a user has thousands of orders". What we care about are what pieces of state need to be checked when you try to process a change - what data do you need to load to know if a change is valid.
Typically, you'll come to realize that changing an order doesn't (or shouldn't) depend on the state of other orders in the system, which is a good indication that two different orders should not be part of the same aggregate.

How do you handle an aggregate root with a collection of child entities whose update frequency is different than the root?

We have an aggregate root in our system and is has child entities in a collection. The problem is that the container needs to be updated very frequently, on a transaction basis, and the children entities don't, they in fact hardly ever change, they are more configuration like in nature.
My first reflex was to separate them into two different aggregate roots because our of application requirements. But I was reminded of the cascade delete rule, if we delete the one then the delete should cascade, so their lifetimes are linked.
We stumbled over this problem when we discovered that we have a caching problem. Changes to the children entities (configuration) were not being reflected in the system at runtime because the parent was unaware of the changes (we had them as one aggregate root but someone had created a repository for its children).

The main driver for aggregate boundaries are the invariants of your domain - or in other terms, aggregate boundaries should be consistency boundaries. Things that must change together atomically must be in the same aggregate.
The cascading delete is (with regards to aggregate boundaries) rather a nice-to-have than a rule. You can always enforce the fact that a Parent still lives by requiring one at the place where you load Child entities. With this design, you can make Parent and Child different aggregates, while still enforcing the rule that no "free floating" Child aggregates can be requested. And deleting Child aggregates in response to a deleted Parent is easy if you have domain events in place.
Note: All this is under the assumption that your domain invariants allow separating the aggregates in the first place.

This might be better in a discussion format, rather than a Q&A format. I'd recommend trying the audience at DomainDrivenDesign or DDDCQRS
Are you sure that you have a business requirement to delete data in your domain model? That's really unusual -- in most domain models I've seen, an aggregate will reach an "end of life" state, (example: AccountClosed), but doesn't actually get removed from the system.
A common trap in aggregate design is to think about the structure of the entities. "A has a B" does not necessarily mean that they are part of the same aggregate; the key idea is "A needs to keep B and C consistent". You can think about it like a graph; state B and state C are nodes in the graph, the consistency rules are the edges. If you can't traverse the graph from B to C, then they don't need to be part of the same aggregate, and probably shouldn't be.
My instinct is that caching should be the right answer here. If you are processing millions of transactions per day, and the collection only changes once per month, then simply using a cached value of the collection should produce the right answer most of the time.
In this, I'm influenced by Udi Dahan's essay Race Conditions Don't Exist; by coupling this configuration collection with the rest of the aggregate, you are essentially asserting that changes to the configuration (which are rare) are understood by the business to be happening precisely between two other changes to the aggregate. 3M transactions per day averages 1 per 30ms; are you really scheduling your configuration changes that precisely?
The usual pattern here would be that the consistency rule is removed from the domain model; instead, you monitor for changes that introduce an inconsistency, and mitigate them. That depends upon there being a reasonable way to detect the errors, an efficient way to mitigate them, and a mechanic for keeping the rate under control.
The latter of these would normally be done by having the clients/the application check their local copy of the collection, and making sure the command sent is consistent with that before dispatching the command to the domain model. (Possible questions for your domain experts: how quickly do the configuration changes need to be applied? Do the configuration changes happen when the aggregate is changing frequently or when it is quiet?)
Another possibility might be to change your persistence strategy; if the collection doesn't change often, then there are not a lot of change events related to it. So maybe instead of persisting the aggregate, you look into persisting its history - in other words, using event-sourcing here. Maybe if this aggregate lived in a micro service, you could limit the risk of the change? Hard to say, at a million transactions per day, this aggregate sounds pretty important.

Do we need another repo for each entity?

For example take an order entity. It's obvious that order lines don't exist without order. So we have to get them with the help of OrderRepository(throw an order entity). Ok. But what about other things that are loosely coupled with order? Should the customer info be available only from CustomerRepo and bank requisites of the seller available from BankRequisitesRepo, etc.? If it is correct, we should pass all these repositories to our Create Factory method I think.

Yes. In general, each major entity (aggregate root in domain driven design terminology) should have their own repositories. Child entities *like order lines) will generally not need one.
And yes. Define each repository as a service then inject them where needed.
You can even design things such that there is no direct coupling between Order and Customer in terms of an actual database link. This in turn allows customers and orders to live in completely independent databases. Which may or may not be useful for your applications.

You correctly understood that aggregate roots's (AR) child entities shall not have their own repository, unless they are themselves AR's. Having repositories for non-ARs would leave your invariants unprotected.
However you must also understand that entities should usually not be clustered together for convenience or just because the business states that some entity has one or many some other entity.
I strongly recommend that you read Effective Aggregate Design by Vaughn Vernon and this other blog post that Vaughn kindly wrote for a question I asked.
One of the aggregate design rule of thumb stated in Effective Aggregate Design is that you should usually reference other aggregates by identity only.
Therefore, you greatly reduce the number of AR instances needed in other AR's creationnal process since new Order(customer, ...) could become new Order(customerId, ...).
If you still find the need to query other AR's in one AR's creationnal process, then there's nothing wrong in injecting repositories as dependencies, but you should not depend on more than you need (e.g. let the client resolve the real dependencies and pass them directly rather than passing in a strategy allowing to resolve a dependency).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string