Can DDD repositories return data from other aggregate roots? - domain-driven-design

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).

But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.

Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

Related

DDD : one aggregate root , multiple persistent datasources

In the Guide/eBook: .NET Microservices: Architecture for Containerized .NET Applications (related to the eShopOnContainers) in the chapter "Designing the infrastructure persistence layer" (page 213) is explained in general how an aggregate root can perform CUD operations against a persistent data source.
Two important starting points are mentioned :
An aggregate is ignorant of methods of persistency and infrastructure following the Persistence Ignorance and the Infrastructure Ignorance principles (page 218). An aggregate is determined by the business and not by the infrastructure.
One should only define one repository per aggregate root to maintain transactional consistency between the objects within the aggregate (page 213)
Unfortunately, in all further examples that are mentioned the aggregate root and all underlying objects that fall under it are within one and the same persistent data source.
The pattern then is as follows:
A repository is created containing that aggregate
In this repository a Unit of Work is injected during creation. This Unit of Work contains methods such as SaveChangesAsync, SaveEntitiesAsync, Update
and so on.
In a command, the Unit of Work manages the transactions to
this one data source such as a database or similar.
I want to expand this pattern that the aggregate can write its data over 2 or more physical data sources depending on the underlying object type.
Starting from starting point 1, it is perfectly justified to have a root aggregate and its underlying object to be updated to different data sources depending on the type of underlying object. Examples mentioned are : a Database and an XML file, a database and a NOSQL 'database',a database and a service, a database and an IoT device. Because an aggregate must be ignorant to methods of persistence and infrastructure, to my opinion there is no need to argue about the design of the aggregate. I think nowhere in the book it is written that a aggregate root should persist within one data source.
At the same time, starting point 2 also seems perfectly justified. Because the complete set of objects within the aggregate root is edited, and the successful persistence of the entire package is coordinated from one repository and (preferably) from one Unit of Work.
The question is:
How deals Domain Driven Design if within the aggregate - depending on the type of the underlying object - it is hydrated over different data sources?
Should I use one custom Unit of Work and make the decision where to write to within this UoW ?
I'm aware of the next question , but having studied the code I think it only deals with inheritance of repositories that deal with different data sources, but still serving one data source at the time and that is not what I'm after.
I want to expand this pattern that the aggregate can write its data over 2 or more physical data sources depending on the underlying object type.
Why do you want to do that on purpose?
In most cases, the persistence implementation is chosen to serve the domain, rather than the other way around. So the happy path typically involves choosing a persistence solution that can record the state of the entire aggregate, and storing the entire thing within a single transaction.
So if you find yourself trying to store an aggregate in two different places, you should take a hard careful look at why.
One common answer is that you want to be able to query the aggregate state efficiently. cqrs is a common solution here - rather than persisting the aggregate in two different data stores, you persist it to one and replicate it to another. The queries can run very efficiently against the replica (although there is of course some additional latency between a change to the aggregate and the reflection of that change in the query results).
Another common answer is that you really have two aggregates that reference each other. Nothing wrong with storing two aggregates in different places. You may be better served by making the distinction between the two explicit in your code.
Dan Pritchett
Jimmy Bogard
How deals Domain Driven Design if within the aggregate - depending on the type of the underlying object - it is hydrated over different data sources?
Badly, just like everybody else.

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
friends
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

How should I enforce relationships and constraints between aggregate roots?

I have a couple questions regarding the relationship between references between two aggregate roots in a DDD model. Refer to the typical Customer/Order model diagrammed below.
First, should references between the actual object implementation of aggregates always be done through ID values and not object references? For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct? I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots? For example say I want all the associated orders to be deleted when a customer is deleted. The ICustomerRepository.DeleteCustomer() method should not be referencing the IOrderRepostiory should it? That seems like that would be breaking the boundaries between the aggregates/repositories? Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
First, should references between aggregates always be done through ID values and not actual object references?
Not really - though some would make that change for performance reasons.
For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct?
Generally, you'd model 1 side of the relationship (eg., Customer.Orders or Order.Customer) for traversal. The other can be fetched from the appropriate Repository (eg., CustomerRepository.GetCustomerFor(Order) or OrderRepository.GetOrdersFor(Customer)).
Wouldn't that mean that the OrderRepository would have to know something about how to create a Customer? Wouldn't that be beyond what OrderRepository should be responsible for...
The OrderRepository would know how to use an ICustomerRepository.FindById(int). You can inject the ICustomerRepository. Some may be uncomfortable with that, and choose to put it into a service layer - but I think that's overkill. There's no particular reason repositories can't know about and use each other.
I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Aggregate roots are allowed to hold references to other aggregate roots. In fact, anything is allowed to hold a reference to an aggregate root. An aggregate root cannot hold a reference to a non-aggregate root entity that doesn't belong to it, though.
Eg., Customer cannot hold a reference to OrderLines - since OrderLines properly belongs as an entity on the Order aggregate root.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots?
If (and I stress if, because it's a peculiar requirement) that's actually a use case, it's an indication that Customer should be your sole aggregate root. In most real-world systems, however, we wouldn't actually delete a Customer that has associated Orders - we may deactivate them, move their Orders to a merged Customer, etc. - but not out and out delete the Orders.
That being said, while I don't think it's pure-DDD, most folks will allow some leniency in following a unit of work pattern where you delete the Orders and then the Customer (which would fail if Orders still existed). You could even have the CustomerRepository do the work, if you like (though I'd prefer to make it more explicit myself). It's also acceptable to allow the orphaned Orders to be cleaned up later (or not). The use case makes all the difference here.
Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
I probably wouldn't go a service route for something so intimately tied to the repository. As for how to make sure a service is used...you just don't put a public Delete on the CustomerRepository. Or, you throw an error if deleting a Customer would leave orphaned Orders.
Another option would be to have a ValueObject describing the association between the Order and the Customer ARs, VO which will contain the CustomerId and additional information you might need - name,address etc (something like ClientInfo or CustomerData).
This has several advantages:
Your ARs are decoupled - and now can be partitioned, stored as event streams etc.
In the Order ARs you usually need to keep the information you had about the customer at the time of the order creation and not reflect on it any future changes made to the customer.
In almost all the cases the information in the value object will be enough to perform the read operations ( display customer info with the order ).
To handle the Deletion/deactivation of a Customer you have the freedom to chose any behavior you like. You can use DomainEvents and publish a CustomerDeleted event for which you can have a handler that moves the Orders to an archive, or deletes them or whatever you need. You can also perform more than one operation on that event.
If for whatever reason DomainEvents are not your choice you can have the Delete operation implemented as a service operation and not as a repository operation and use a UOW to perform the operations on both ARs.
I have seen a lot of problems like this when trying to do DDD and i think that the source of the problems is that developers/modelers have a tendency to think in DB terms. You ( we :) ) have a natural tendency to remove redundancy and normalize the domain model. Once you get over it and allow your model to evolve and implicate the domain expert(s) in it's evolution you will see that it's not that complicated and it's quite natural.
UPDATE: and a similar VO - OrderInfo can be placed inside the Customer AR if needed, with only the needed information - order total, order items count etc.

How do you handle associations between aggregates in DDD?

I'm still wrapping my head around DDD, and one of the stumbling blocks I've encountered is in how to handle associations between separate aggregates. Say I've got one aggregate encapsulating Customers and another encapsulating Shipments.
For business reasons Shipments are their own aggregates, and yet they need to be explicitly tied to Customers. Should my Customer domain entity have a list of Shipments? If so, how do I populate this list at the repository level - given I'll have a CustomerRepository and a ShipmentRepository (one repo per aggregate)?
I'm saying 'association' rather than 'relationship' because I want to stress that this is a domain decision, not an infrastructure one - I'm designing the system from the model first.
Edit: I know I don't need to model tables directly to objects - that's the reason I'm designing the model first. At this point I don't care about the database at all - just the associations between these two aggregates.
There's no reason your ShipmentRepository can't aggregate customer data into your shipment models. Repositories do not have to have a 1-to-1 mapping with tables.
I have several repositories which combine multiple tables into a single domain model.
I think there's two levels of answering this question. At one level, the question is how do I populate the relationship between customer and shipment. I really like the "fill" semantics where your shipment repository can have a fillOrders( List customers, ....).
The other level is "how do I handle the denormalized domain models that are a part of DDD". And "Customer" is probably the best example of them all, because it simply shows up in such a lot of different contexts; almost all your processes have customer in them and the context of the customer is usually extremely varied. At max half the time you are interested in the "orders". If my understanding of the domain was perfect when starting, I'd never make a customer domain concept. But it's not, so I always end up making the Customer object. I still remember the project where I after 3 years felt that I was able to make the proper "Customer" domain model. I would be looking for the alternate and more detailed concepts that also represent the customer; PotentialCustomer, OrderingCustomer, CustomerWithOrders and probably a few others; sorry the names aren't better. I'll need some more time for that ;)
Shipment has relation many-to-one relationship with Customer.
If your are looking for the shipments of a client, add a query to your shipment repository that takes a client parameter.
In general, I don't create one-to-mane associations between entities when the many side is not limited.

Resources