DDD: Confusion about repository/domain boundaries

DDD: Confusion about repository/domain boundaries - domain-driven-design

My domain consists of Products, Departments, Classes, Manufacturers, DailySales, HourlySales.
I have a ProductRepository which facilitates storing/retrieving products from storage.
I have a DepartmentAndClass repository which facilitates storing/retrieving of departments and classes, as well as adding and removing products from those departments and classes.
I also have a DailySales repository which I use to retrieve statistics about daily sales from multiple groupings. ie..
DailySales.GetSalesByDepartment(dateTime)
DailySales.GetSalesByClass(dateTime)
DailySales.GetSalesByHour(dateTime)
Is it correct to have these sales tracking methods in their own repository like this? Am I on the right track?

Since domains are so dependent on context some answers are harder than others. I would, however, place statistics on the Query side of things. You probably do not want to be calculating those stats on the fly as you will be placing some heavy processing on your database. Typically the stats should be denormalized for quick access where only filtering is required.
You may want to take a look at CQRS if you haven't done so.

Although most queries return an object or a collection of objects, it also fits within the concept to return some types of summary calculations, such as an object count, or a sum of a numerical attribute that was intended by the model to be tallied.
Eric Evans - Domain-Driven Design
This might be considered a read model. Are these daily sales objects being used in any domain model behaviour? Does any business logic depend on them? If not, it might be a good idea to separate this out into a distinct read model - at which point you're taking your first steps into CQRS.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).

But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.

Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.

For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.

I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

DDD modeling 1:1...N relationships with query performance in mind

I'm a DDD beginner, and I have a legacy project which would surely benefit from a proper domain layer. The application has to be modified to support multiple application and UI layers. The domain logic is at the moment implemented using the transaction script pattern. Basically I inherited a DB structure which is not allowed to be altered, the new application should be a drop in replacement of the old one.
I stumbled upon an interesting modelling problem in a small part of the domain, which I'm sure experienced DDD practitioners will find interesting. I can't be too specific about the problem, so I'll describe a problem which closely matches mine.
Problem description
Let's suppose we should manage a collection of products. Products are identified by ids, they contain some description, and every product has a few images associated with it. Here comes the tricky part. The images, their contents, are physically stored in the DB, so they are huge chunks of data. (let's just ignore now how good or bad is storing images in a DB, it's just an example). There are some invariants that must be enforced on adding/editing/removing products.
Adding products
A product is only valid if it has images associated with it, without adding images a new product should not be allowed to be entered
Every product must be associated with exactly 5 images, no more, no less.
The order of images associated with the product must be maintained
Editing products
Images of existing products can be replaced, but the number and order of the associated images should be maintained
Removing products
When a product is removed, all of the images associated with it should also be removed
Considered solutions
The class diagrams of various solutions
Solution 1:
The simplest way to model these concepts would be the following.
The Product is the AR. The Images associated with the Product can be accessed and modified through the Product, so Product is responsible for enforcing the 5 Images rule. The advantage of this approach is that invalid Products can't be created or edited in a way to make them invalid, and no Images will be left behind when a Product is removed. So the aggregate if formed around the transaction boundary. The problems with this approach is that in the vast majority of cases the UI would just need to present the list of products, and maybe to modify their description. The UI would very rarely need to display or modify the Images associated with the product. So 95% of the cases huge amounts of unnecessary data would be loaded into the memory.
Lazy loading? The domain model should be implemented in a language which doesn't have ORM tools with lazy loading support. Implement my own lazy loading mechanism? The domain objects shouldn't be aware of the way they're persisted or if they're persisted at all. Instead Solution 2 is recommended by Vaughn Vernon.
Solution 2:
The querying performance problems can be solved with this approach by favoring small aggregates and following the reference other aggregates by identity rule. Vaughn Vernon has a great series of articles describing how to achieve this.
The aggregate is split into two parts Product and ImageSet. Both of them are referencing ProductId as a value object. The Product would be responsible for enforcing the no product without Images rule, and the ImageSet would enforce the no ImageSet without 5 images rule. Querying is not a problem anymore, the ImageSet would be retrieved only when it's needed by a service.
However, this problem is a lot more complex then what Vernon describes in his articles (0...N association). The problem is that the creation of a Product would lead to modifying or creating 2 aggregates, which eliminates the purpose of modelling aggregates around transaction boundaries. The service which adds the new Product would be responsible for transaction management.
Solution 3:
The final solution would be the use of bounded contexts. So for simplicity we name them BC1 and BC2. In BC1 a Product would just contain the ProductDetails. Services interested in querying Products for their details and maybe modifiyng them would use BC1 (ProductRepository in BC1 wouldn't allow adding or removing products, just querying/modifying existing ones). In BC2 a Product would contain the ProductDetails and the Images associated with it. So services interested in adding/removing products, and modifying/retrieving their images would use BC2. Commmon value objects and entities would be shared between these 2 BCs.
This solution would solve all the transactional consistency and querying performance problems. However, I'm not sure based on their definition BCs should be created in response to these kinds of problems.
I'm sorry for the long question, but I feel I should really point out which kinds of solutions I've already considered. And sorry for the linked images, I'm not allowed to upload images yet.

An important observation in your use-case is that the problems of the 1st solution are isolated to the query side of the application. There is no reason to use the same model for processing commands and enforcing constraints as the model used for queries. The read-model pattern can be used to separate the reads from the writes which would allow you to create specific read-models for specific UI requirements and the read-model won't affect your domain model. While it is tempting to utilize the same model for reading as the one for writing, especially given that most ORMs support intricate queries and given the DRY principle, in practice it is much easier to separate the read model from the executable domain model.
Also, the series of articles by Vaughn Vernon are a great resource for understanding intricacies of aggregate design, however the central focus of the articles is on how to partition aggregates based on behavioral requirements not query requirements.

In DDD, are collection properties of entities allowed to have partial values?

In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?

I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.

Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.

How do you handle associations between aggregates in DDD?

I'm still wrapping my head around DDD, and one of the stumbling blocks I've encountered is in how to handle associations between separate aggregates. Say I've got one aggregate encapsulating Customers and another encapsulating Shipments.
For business reasons Shipments are their own aggregates, and yet they need to be explicitly tied to Customers. Should my Customer domain entity have a list of Shipments? If so, how do I populate this list at the repository level - given I'll have a CustomerRepository and a ShipmentRepository (one repo per aggregate)?
I'm saying 'association' rather than 'relationship' because I want to stress that this is a domain decision, not an infrastructure one - I'm designing the system from the model first.
Edit: I know I don't need to model tables directly to objects - that's the reason I'm designing the model first. At this point I don't care about the database at all - just the associations between these two aggregates.

There's no reason your ShipmentRepository can't aggregate customer data into your shipment models. Repositories do not have to have a 1-to-1 mapping with tables.
I have several repositories which combine multiple tables into a single domain model.

I think there's two levels of answering this question. At one level, the question is how do I populate the relationship between customer and shipment. I really like the "fill" semantics where your shipment repository can have a fillOrders( List customers, ....).
The other level is "how do I handle the denormalized domain models that are a part of DDD". And "Customer" is probably the best example of them all, because it simply shows up in such a lot of different contexts; almost all your processes have customer in them and the context of the customer is usually extremely varied. At max half the time you are interested in the "orders". If my understanding of the domain was perfect when starting, I'd never make a customer domain concept. But it's not, so I always end up making the Customer object. I still remember the project where I after 3 years felt that I was able to make the proper "Customer" domain model. I would be looking for the alternate and more detailed concepts that also represent the customer; PotentialCustomer, OrderingCustomer, CustomerWithOrders and probably a few others; sorry the names aren't better. I'll need some more time for that ;)

Shipment has relation many-to-one relationship with Customer.
If your are looking for the shipments of a client, add a query to your shipment repository that takes a client parameter.
In general, I don't create one-to-mane associations between entities when the many side is not limited.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string