DDD: do I really need to load all objects in an aggregate? (Performance concerns) - domain-driven-design

In DDD, a repository loads an entire aggregate - we either load all of it or none of it. This also means that should avoid lazy loading.
My concern is performance-wise. What if this results in loading into memory thousands of objects? For example, an aggregate for Customer comes back with ten thousand Orders.
In this sort of cases, could it mean that I need to redesign and re-think my aggregates? Does DDD offer suggestions regarding this issue?

Take a look at this Effective Aggregate Design series of three articles from Vernon. I found them quite useful to understand when and how you can design smaller aggregates rather than a large-cluster aggregate.
EDIT
I would like to give a couple of examples to improve my previous answer, feel free to share your thoughts about them.
First, a quick definition about an Aggregate (took from Patterns, Principles and Practices of Domain Driven Design book by Scott Millet)
Entities and Value Objects collaborate to form complex relationships that meet invariants within the domain model. When dealing with large interconnected associations of objects, it is often difficult to ensure consistency and concurrency when performing actions against domain objects. Domain-Driven Design has the Aggregate pattern to ensure consistency and to define transactional concurrency boundaries for object graphs. Large models are split by invariants and grouped into aggregates of entities and value objects that are treated as conceptual whole.
Let's go with an example to see the definition in practice.
Simple Example
The first example shows how defining an Aggregate Root helps to ensure consistency when performing actions against domain objects.
Given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If we say that Bid is also a separated Aggregate Root you would have have a BidsRepository, and you could easily do:
var newBid = new Bid(money);
BidsRepository->save(auctionId, newBid);
And you were saving a Bid without passing the defined business rule. However, having the Auction as the only Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
auction.placeBid(newBid);
auctionRepository.save(auction);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid.
Here it is pretty clear that the state of a Bid depends on the state of an Auction.
Complex Example
Back to your example of Orders being associated to a Customer, looks like there are not invariants that make us define a huge aggregate consisting of a Customer and all her Orders, we can just keep the relation between both entities thru an identifier reference. By doing this, we avoid loading all the Orders when fetching a Customer as well as we mitigate concurrency problems.
But, say that now business defines the next invariant:
We want to provide Customers with a pocket so they can charge it with money to buy products. Therefore, if a Customer now wants to buy a product, it needs to have enough money to do it.
Said so, pocket is a VO inside the Customer Aggregate Root. It seems now that having two separated Aggregate Roots, one for Customer and another one for Order is not the best to satisfy the new invariant because we could save a new order without checking the rule. Looks like we are forced to consider Customer as the root. That is going to affect our performance, scalaibility and concurrency issues, etc.
Solution? Eventual Consistency. What if we allow the customer to buy the product? that is, having an Aggregate Root for Orders so we create the order and save it:
var newOrder = new Order(customerId, ...);
orderRepository.save(newOrder);
we publish an event when the order is created and then we check asynchronously if the customer has enough funds:
class OrderWasCreatedListener:
var customer = customerRepository.findOfId(event.customerId);
var order = orderRepository.findOfId(event.orderId);
customer.placeOrder(order); //Check business rules
customerRepository.save(customer);
If everything was good, we have satisfied our invariants while keeping our design as we wanted at the beginning modifying just one Aggregate Root per request. Otherwise, we will send an email to the customer telling her about the insufficient funds issue. We can take advance of it by adding to the email alternatives options she can purchase with her current budget as well as encourage her to charge the pocket.
Take into account that the UI can help us to avoid having customers paying without enough money, but we cannot blindly trust on the UI.
Hope you find both examples useful, and let me know if you find better solutions for the exposed scenarios :-)

In this sort of cases, could it mean that I need to redesign and re-think my aggregates?
Almost certainly.
The driver for aggregate design isn't structure, but behavior. We don't care that "a user has thousands of orders". What we care about are what pieces of state need to be checked when you try to process a change - what data do you need to load to know if a change is valid.
Typically, you'll come to realize that changing an order doesn't (or shouldn't) depend on the state of other orders in the system, which is a good indication that two different orders should not be part of the same aggregate.

Related

Relationship between concepts in DDD

I'm developing a budgeting app using Domain Driven Design. I'm new to DDD and therefore need a validation of my design.
Here are the concepts I came up with:
Transaction - which is either income or expense, on annual or monthly or one-off etc. basis.
Budget - which is the calculated income, expenses and balance projection, divided into occurrences (say e.g. 12 months over the next year, based on the Transactions).
I made the Transaction the Entity and Aggregate Root. In my mind it has identity, it's a concrete planned expense or income that I know I'll receive, for a concrete thing, and I also need to persist it, so I can calculate the budget based on all my transactions.
Now, I have an issue with the Budget. It depends on my concrete list of Transactions. If one of the Transactions gets deleted, the budget will need to be re-calculated (seems like a good candidate for a domain event?). It's a function of my identifiable transactions at any given time.
Nothing outside the Aggregate boundary can hold a reference to anything inside, except to the root Entity. Which makes me think the budget is the Aggregate Root as it cannot be a ValueObject or Entity within the Transaction.
What's confusing is that I don't necessarily need to persist the budget (unless I want to cache it). I could calculate it from scratch on request, and send it over to the client app. 2 different budgets could have the same number of occurrences, incomes, expenses and balances (but not Transactions). Perhaps an argument for making it a ValueObject?
So, my questions is - what is the Budget?
Domain context vs Aggregate
First element you get wrong is a point of details about DDD semantics. If there is only one object in your "aggregate", then it is not an aggregate. An aggregate is a structure made of multiple (2+) objects, with at least one being an entity and called the aggregate root. If a TransactionRpository returns a Transaction object that has no value object or entity, then Transaction is an entity but not an aggregate nor an aggregate root. If a BudgetRepository returns a Budget entity that includes a Transaction object, then Budget and Transaction form an aggregate, Budget being the aggregate root. If Budget and Transaction are returned from different repositories, then they form different contexts.
Context being the generic concept that can either be an aggregate or an entity.
Contexts are linked to use cases
Second element you get wrong is that you are trying to design your domain model outside of your use cases context. Your application clearly manipulates both concepts of Budget and Transactions, but does your application handles uses cases for both (budget management and transaction management) ? If yes, are these uses case different in a way that implies different domain constraints ?
If your application only handles Budget management, or both but they share their business constraints, then you only need a single context, that manipulates both concepts in a single aggregate. In that situation, Budget is probably your root aggregate, and it's up to your mode and use cases to tell whether the Transaction is a value object or you need to access them by Id.
If your application handles uses cases for both, with different business constraints, then you should split your domain in two contexts, with two different models, one for the Budget management use cases, the other for the Transaction management use cases.
Polysemic domain model
The third element you get wrong, is that you are trying to build a single, unified, normalized domain model. This is wrong because it introduces very complex structures, and a lot of business rules that are irrelevant to your business cases. Why would you need to manipulate the Budget domain model when the use case does not need knowledge of the Budget concept or linked business rules ?
If your application has use cases for both concepts, you need two models. The Budget management model should not use the Transaction management model. However, that does not implies that the Budget model is not allowed to manipulate the Transaction concept and vice versa. It only means you must write another model for that. You could have a Budget context that manipulates Budget and BudgetTransaction models, and Transaction context that manipulates Transaction and TransactionBudget models. These models can map to the same RDBMS tables with different columns, relevant to their use cases, implementing relevant business rules.
This is called writing a polysemic domain model.
Conclusion
So, my questions is - what is the Budget?
It is not possible to answer definitely your last question, as the answer depends on the use cases your application handles. However, you mention the following constraint:
If one of the Transactions gets deleted, the budget will need to be re-calculated
This seems a very good argument in favor of making your application as a single context application, based on an aggregate with Budget being the aggregate root and Transaction being an entity in the aggregate.
If you don't need to, try to refrain from splitting these two concepts in different contexts, unless you have very good reasons to do so: they manipulate excluding columns, they manipulate excluding business rules, you are interested in deploying these two in different bounded contexts, different services, as they would scale differently, etc ...
Having business constraints that span accross multiple contexts implies a complex implementation based on domain events, 2-phase commits, saga pattern, etc ... It's a lot of work, you should balance that work with the benefits you expect in return.

Where to place this invariant?

I'm working on a side project to learn and apply DDD within the "Daily Deal' domain. In my purchasing context, i have an invariant where a user can only purchase 'x' amount of deals per deal.
so it seems wasteful for my deal aggregate to load all purchases from all users just to check and see how many times (if any) the user has purchased this deal. I see two ways i could go about this.
Put this logic within a domain service which would allow a pre-condition to already have been met when the Purchase method on the Deal aggregate is invoked.
My repository implementation could always populate the purchases collection of the deal for the purchasing user. hmm...not sure about this one.
any guidance would be great!
I would take the second approach, but with one important change. I would instead create a value object called PurchasedDeal, that consists of just a DealID and Quantity field. The User aggregate could instead load a collection of this more lightweight purchase history object. Performance should be good with this approach, since I'm guessing that the average user will only have a few dozen purchase records.
Also remember that with DDD, you can and probably should have different models per bounded context. So you might design your User aggregate like this in the context of deals/purchasing. However, your User aggregate in another context would look different and not have a purchase history if it's not needed.

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
friends
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

How should I enforce relationships and constraints between aggregate roots?

I have a couple questions regarding the relationship between references between two aggregate roots in a DDD model. Refer to the typical Customer/Order model diagrammed below.
First, should references between the actual object implementation of aggregates always be done through ID values and not object references? For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct? I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots? For example say I want all the associated orders to be deleted when a customer is deleted. The ICustomerRepository.DeleteCustomer() method should not be referencing the IOrderRepostiory should it? That seems like that would be breaking the boundaries between the aggregates/repositories? Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
First, should references between aggregates always be done through ID values and not actual object references?
Not really - though some would make that change for performance reasons.
For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct?
Generally, you'd model 1 side of the relationship (eg., Customer.Orders or Order.Customer) for traversal. The other can be fetched from the appropriate Repository (eg., CustomerRepository.GetCustomerFor(Order) or OrderRepository.GetOrdersFor(Customer)).
Wouldn't that mean that the OrderRepository would have to know something about how to create a Customer? Wouldn't that be beyond what OrderRepository should be responsible for...
The OrderRepository would know how to use an ICustomerRepository.FindById(int). You can inject the ICustomerRepository. Some may be uncomfortable with that, and choose to put it into a service layer - but I think that's overkill. There's no particular reason repositories can't know about and use each other.
I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Aggregate roots are allowed to hold references to other aggregate roots. In fact, anything is allowed to hold a reference to an aggregate root. An aggregate root cannot hold a reference to a non-aggregate root entity that doesn't belong to it, though.
Eg., Customer cannot hold a reference to OrderLines - since OrderLines properly belongs as an entity on the Order aggregate root.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots?
If (and I stress if, because it's a peculiar requirement) that's actually a use case, it's an indication that Customer should be your sole aggregate root. In most real-world systems, however, we wouldn't actually delete a Customer that has associated Orders - we may deactivate them, move their Orders to a merged Customer, etc. - but not out and out delete the Orders.
That being said, while I don't think it's pure-DDD, most folks will allow some leniency in following a unit of work pattern where you delete the Orders and then the Customer (which would fail if Orders still existed). You could even have the CustomerRepository do the work, if you like (though I'd prefer to make it more explicit myself). It's also acceptable to allow the orphaned Orders to be cleaned up later (or not). The use case makes all the difference here.
Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
I probably wouldn't go a service route for something so intimately tied to the repository. As for how to make sure a service is used...you just don't put a public Delete on the CustomerRepository. Or, you throw an error if deleting a Customer would leave orphaned Orders.
Another option would be to have a ValueObject describing the association between the Order and the Customer ARs, VO which will contain the CustomerId and additional information you might need - name,address etc (something like ClientInfo or CustomerData).
This has several advantages:
Your ARs are decoupled - and now can be partitioned, stored as event streams etc.
In the Order ARs you usually need to keep the information you had about the customer at the time of the order creation and not reflect on it any future changes made to the customer.
In almost all the cases the information in the value object will be enough to perform the read operations ( display customer info with the order ).
To handle the Deletion/deactivation of a Customer you have the freedom to chose any behavior you like. You can use DomainEvents and publish a CustomerDeleted event for which you can have a handler that moves the Orders to an archive, or deletes them or whatever you need. You can also perform more than one operation on that event.
If for whatever reason DomainEvents are not your choice you can have the Delete operation implemented as a service operation and not as a repository operation and use a UOW to perform the operations on both ARs.
I have seen a lot of problems like this when trying to do DDD and i think that the source of the problems is that developers/modelers have a tendency to think in DB terms. You ( we :) ) have a natural tendency to remove redundancy and normalize the domain model. Once you get over it and allow your model to evolve and implicate the domain expert(s) in it's evolution you will see that it's not that complicated and it's quite natural.
UPDATE: and a similar VO - OrderInfo can be placed inside the Customer AR if needed, with only the needed information - order total, order items count etc.

What Belongs to the Aggregate Root

This is a practical Domain Driven Design question:
Conceptually, I think I get Aggregate roots until I go to define one.
I have an Employee entity, which has surfaced as an Aggregate root. In the Business, some employees can have work-related Violations logged against them:
Employee-----*Violations
Since not all Employees are subject to this, I would think that Violations would not be a part of the Employee Aggregate, correct?
So when I want to work with Employees and their related violations, is this two separate Repository interactions by some Service?
Lastly, when I add a Violation, is that method on the Employee Entity?
Thanks for the help!
After doing even MORE research, I think I have the answer to my question.
Paul Stovell had this slightly edited response to a similar question on the DDD messageboard. Substitute "Customer" for "Employee", and "Order" for "Violation" and you get the idea.
Just because Customer references Order
doesn't necessarily mean Order falls
within the Customer aggregate root.
The customer's addresses might, but
the orders can be independent (for
example, you might have a service that
processes all new orders no matter who
the customer is. Having to go
Customer->Orders makes no sense in
this scenario).
From a domain point of view, you can
even question the validity of those
references (Customer has reference to
a list of Orders). How often will you
actually need all orders for a
customer? In some systems it makes
sense, but in others, one customer
might make many orders. Chances are
you want orders for a customer between
a date range, or orders for a customer
that aren't processed yet, or orders
which have not been paid, and so on.
The scenario in which you'll need all
of them might be relatively uncommon.
However, it's much more likely that
when dealing with an Order, you will
want the customer information. So in
code, Order.Customer.Name is useful,
but Customer.Orders[0].LineItem.SKU -
probably not so useful. Of course,
that totally depends on your business
domain.
In other words, Updating Customer has nothing to do with updating Orders. And orders, or violations in my case, could conceivable be dealt with independently of Customers/Employees.
If Violations had detail lines, then Violation and Violation line would then be a part of the same aggregate because changing a violation line would likely affect a Violation.
EDIT**
The wrinkle here in my Domain is that Violations have no behavior. They are basically records of an event that happened. Not sure yet about the implications that has.
Eric Evan states in his book, Domain-Driven Design: Tackling the Complexity in the Heart of Software,
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes.
There are 2 important points here:
These objects should be treated as a "unit".
For the purpose of "data change".
I believe in your scenario, Employee and Violation are not necessarily a unit together, whereas in the example of Order and OrderItem, they are part of a single unit.
Another thing that is important when modeling the agggregate boundaries is whether you have any invariants in your aggregate. Invariants are business rules that should be valid within the "whole" aggregate. For example, as for the Order and OrderItem example, you might have an invariant that states the total cost of the order should be less than a predefined amount. In this case, anytime you want to add an OrderItem to the Order, this invariant should be enforced to make sure that your Order is valid. However, in your problem, I don't see any invariants between your entities: Employee and Violation.
So short answer:
I believe Employee and Violation each belong to 2 separate aggregates. Each of these entities are also their own aggregate roots. So you need 2 repositories: EmployeeRepository and ViolationRepository.
I also believe you should have an unidirectional association from Violation to Employee. This way, each Violation object knows who it belongs to. But if you want to get the list of all Violations for a particular Employee, then you can ask the ViolationRepository:
var list = repository.FindAllViolationsByEmployee(someEmployee);
You say that you have employee entity and violations and each violation does not have any behavior itself. From what I can read above, it seems to me that you may have two aggregate roots:
Employee
EmployeeViolations (call it EmployeeViolationCard or EmployeeViolationRecords)
EmployeeViolations is identified by the same employee ID and it holds a collection of violation objects. You get behavior for employee and violations separated this way and you don't get Violation entity without behavior.
Whether violation is entity or value object you should decide based on its properties.
I generally agree with Mosh on this one. However, keep in mind the notion of transactions in the business point of view. So I actually take "for the purpose of data changes" to mean "for the purpose of transaction(s)".
Repositories are views of the domain model. In a domain environment, these "views" really support or represent a business function or capability - a transaction. Case in point, the Employee may have one or more violations, and if so, are aspects of a transaction(s) in a point in time. Consider your use cases.
Scenario: "An employee commits an act that is a violation of the workplace." This is a type of business event (i.e. transaction, or part of a larger, perhaps distributed transaction) that occurred. The root affected domain object actually can be seen from more than one perspective, which is why it is confusing. But the thing to remember is behavior as it pertains to a business transaction, since you want your business processes to model the real-world as accurate as possible. In terms of relationships, just like in a relational database, your conceptual domain model should actually indicate this already (i.e. the associativity), which often can be read in either direction:
Employee <----commits a -------committed by ----> Violation
So for this use case, it would be fair that to say that it is a transaction dealing with violations, and that the root - or "primary" entity - is a Violation. That, then would be your aggregate root you would reference for that particular business activity or business process. But that is not to say that, for a different activity or process, that you cannot have an Employee aggregate root, such as the "new employee process". If you take care, there should be no negative impact of cyclic references, or being able to traverse your domain model multiple ways. I will warn, however, that governing of this should be thought about and handled by your controller piece of your business domain, or whatever equivalent you have.
Aside: Thinking in terms of patterns (i.e. MVC), the repository is a view, the domain objects are the model, and thus one should also employ some form of controller pattern. Typically, the controller declares the concrete implementation of and access to the repositories (collections of aggregate roots).
In the data access world...
Using LINQ-To-SQL as an example, the DataContext would be the controller exposing a view of Customer and Order entities. The view is a non-declarative, framework-oriented Table type (rough equivalent to Repository). Note that the view keeps a reference to its parent controller, and often goes through the controller to control how/when the view gets materialized. Thus, the controller is your provider, taking care of mapping, translation, object hydration, etc. The model is then your data POCOs. Pretty much a typical MVC pattern.
Using N/Hibernate as an example, the ISession would be the controller exposing a view of Customer and Order entities by way of the session.Enumerable(string query) or session.Get(object id) or session.CreateCriteria(typeof(Customer)).List()
In the business logic world...
Customer { /*...*/ }
Employee { /*...*/ }
Repository<T> : IRepository<T>
, IEnumerable<T>
//, IQueryable<T>, IQueryProvider //optional
{ /**/ }
BusinessController {
Repository<Customer> Customers { get{ /*...*/ }} //aggregate root
Repository<Order> Orders { get{ /*...*/ }} // aggregate root
}
In a nutshell, let your business processes and transactions be the guide, and let your business infrastructure naturally evolve as processes/activities are implemented or refactored. Moreover, prefer composability over traditional black box design. When you get to service-oriented or cloud computing, you will be glad you did. :)
I was wondering what the conclusion would be?
'Violations' become a root entity. And 'violations' would be referenced by 'employee' root entity. ie violations repository <-> employee repository
But you are consfused about making violations a root entity becuase it has no behavior.
But is 'behaviour' a criteria to qualify as a root entity? I dont think so.
a slightly orthogonal question to test understanding here, going back to Order...OrderItem example, there might be an analytics module in the system that wants to look into OrderItems directly i.e get all orderItems for a particular product, or all order items greater than some given value etc, does having a lot of usecases like that and driving "aggregate root" to extreme could we argue that OrderItem is a different aggregate root in itself ??
It depends. Does any change/add/delete of a vioation change any part of employee - e.g. are you storing violation count, or violation count within past 3 years against employee?

Resources