DDD modeling 1:1...N relationships with query performance in mind

DDD modeling 1:1...N relationships with query performance in mind - domain-driven-design

I'm a DDD beginner, and I have a legacy project which would surely benefit from a proper domain layer. The application has to be modified to support multiple application and UI layers. The domain logic is at the moment implemented using the transaction script pattern. Basically I inherited a DB structure which is not allowed to be altered, the new application should be a drop in replacement of the old one.
I stumbled upon an interesting modelling problem in a small part of the domain, which I'm sure experienced DDD practitioners will find interesting. I can't be too specific about the problem, so I'll describe a problem which closely matches mine.
Problem description
Let's suppose we should manage a collection of products. Products are identified by ids, they contain some description, and every product has a few images associated with it. Here comes the tricky part. The images, their contents, are physically stored in the DB, so they are huge chunks of data. (let's just ignore now how good or bad is storing images in a DB, it's just an example). There are some invariants that must be enforced on adding/editing/removing products.
Adding products
A product is only valid if it has images associated with it, without adding images a new product should not be allowed to be entered
Every product must be associated with exactly 5 images, no more, no less.
The order of images associated with the product must be maintained
Editing products
Images of existing products can be replaced, but the number and order of the associated images should be maintained
Removing products
When a product is removed, all of the images associated with it should also be removed
Considered solutions
The class diagrams of various solutions
Solution 1:
The simplest way to model these concepts would be the following.
The Product is the AR. The Images associated with the Product can be accessed and modified through the Product, so Product is responsible for enforcing the 5 Images rule. The advantage of this approach is that invalid Products can't be created or edited in a way to make them invalid, and no Images will be left behind when a Product is removed. So the aggregate if formed around the transaction boundary. The problems with this approach is that in the vast majority of cases the UI would just need to present the list of products, and maybe to modify their description. The UI would very rarely need to display or modify the Images associated with the product. So 95% of the cases huge amounts of unnecessary data would be loaded into the memory.
Lazy loading? The domain model should be implemented in a language which doesn't have ORM tools with lazy loading support. Implement my own lazy loading mechanism? The domain objects shouldn't be aware of the way they're persisted or if they're persisted at all. Instead Solution 2 is recommended by Vaughn Vernon.
Solution 2:
The querying performance problems can be solved with this approach by favoring small aggregates and following the reference other aggregates by identity rule. Vaughn Vernon has a great series of articles describing how to achieve this.
The aggregate is split into two parts Product and ImageSet. Both of them are referencing ProductId as a value object. The Product would be responsible for enforcing the no product without Images rule, and the ImageSet would enforce the no ImageSet without 5 images rule. Querying is not a problem anymore, the ImageSet would be retrieved only when it's needed by a service.
However, this problem is a lot more complex then what Vernon describes in his articles (0...N association). The problem is that the creation of a Product would lead to modifying or creating 2 aggregates, which eliminates the purpose of modelling aggregates around transaction boundaries. The service which adds the new Product would be responsible for transaction management.
Solution 3:
The final solution would be the use of bounded contexts. So for simplicity we name them BC1 and BC2. In BC1 a Product would just contain the ProductDetails. Services interested in querying Products for their details and maybe modifiyng them would use BC1 (ProductRepository in BC1 wouldn't allow adding or removing products, just querying/modifying existing ones). In BC2 a Product would contain the ProductDetails and the Images associated with it. So services interested in adding/removing products, and modifying/retrieving their images would use BC2. Commmon value objects and entities would be shared between these 2 BCs.
This solution would solve all the transactional consistency and querying performance problems. However, I'm not sure based on their definition BCs should be created in response to these kinds of problems.
I'm sorry for the long question, but I feel I should really point out which kinds of solutions I've already considered. And sorry for the linked images, I'm not allowed to upload images yet.

An important observation in your use-case is that the problems of the 1st solution are isolated to the query side of the application. There is no reason to use the same model for processing commands and enforcing constraints as the model used for queries. The read-model pattern can be used to separate the reads from the writes which would allow you to create specific read-models for specific UI requirements and the read-model won't affect your domain model. While it is tempting to utilize the same model for reading as the one for writing, especially given that most ORMs support intricate queries and given the DRY principle, in practice it is much easier to separate the read model from the executable domain model.
Also, the series of articles by Vaughn Vernon are a great resource for understanding intricacies of aggregate design, however the central focus of the articles is on how to partition aggregates based on behavioral requirements not query requirements.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).

But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.

Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

Service Layer DTOs - Large Complex Interactive Report-Like Objects

I have Meeting objects that form the basis of a scheduling system, of which gridviews are used to display the important information. This is for the purpose of scheduling employees to meetings, and for employees to view what has been scheduled.
I have been trying to follow DDD principles, but I'm having difficulty knowing what to pass from my service layer down to presentation area of system. This is because the schedule can be LARGE, and actually consists of many different elements of the system. Eg. Client Name, Address, Case Info, Group,etc, all of which are needed for the meeting scheduler to make a decision.
In addition to this, the scheduler needs to change values within this schedule and pass it back up to the service layer (eg. assign employees from dropdowns, maybe change group, etc). So, the information isn't really "readonly" - it needs to be interacted with. ie. It's not just a report.
Our current approach is to populate a flattened "Schedule Object" from SQL, which is constructed from small parts of different domain objects. It's quite a complex query. When changes have been made, this is then passed back up to the service layer, and the service will retrieve the domain objects in question, and fire business methods on the domain objects using information from the DTOs.
My question is, is this the correct approach? ie. Continue to generate large custom objects from SQL, and then pass down from Service Layer to Presentation Layer objects that feel a lot like View Models?
UPDATE due to an answer
To give a idea of the amount entities / aggregates relationships involved. (this is an obfuscated examples, so relationships are the important things here)
Client is in one default group
Client has one open case but many closed
Cases have many Meetings
Meeting have many assigned Employees
Meeting have many reasons
Meeting can get scheduled to different groups
Employees can be associated with many groups.
The schedule need to loads all meetings in open cases that belong to patients who are in the same groups as the employee.
Scheduler can see Client Name, Client Address, Case Info, MeetingTime, MeetingType, MeetingReasons, scheduledGroup(s) (showstrail), Assigned Employees (also has hidden employee ids).
Editable fields are assign employee dropdowns and scheduled group.
Schedule may be up to two hundred rows.
DTO is coming down from WCF, so domain model is accessed above this service layer, and not below.
Domain model business calls leveraged by service based on DTO values passed back, and repositories deal with inserts/updates.
So, I suppose to update, is using a query to populate an object which contains all of the above acceptable to pass down as one merged DTO? And if not, how would you approach it? ( giving some example calls to service layer, and explaining a little bit about how you conceive the ORM fetching the data keeping in mind performance)

In the service layer and below, I would treat each entity (see aggregate roots in DDD) separate with respect to it's transactional boundary. I.e. even if you could update a client and a case in the same UI view, it would be best to transactionally modify the client and then modify the case. The more you try to modify in one transaction, the more you can conflict with other users.
Although your schedule is large and can contain lots of objects, the service layer should again deal with each entity (aggregate root) separately and then bundle them together into a new view model. Sadly, on brown-field projects, a lot of logic might be in the SQL and the massive multi-table joins might make this harder to refactor into more atomic queries that do exactly what is needed. The old-school data-centric view of 'do everything you can in the database' goes against everything DDD.
Because DDD is a collection of design ideas and patterns and not particularly a methodology or an architecture, it sounds that it might be too late to try shoe-horn your current application into a DDD application-centric design. It sounds as though your current app is very entrenched in the data-centric view.
If everything is currently being passed up through the layers in one monolithic chunk, it might be best to keep with this style and just expose these monolithic chunks to the people in the other team who wish to consume them, for use in their new app. You might be able to put some sort of view model caching in place (a bit like the caching view model element in CQRS).
In my personal opinion, data-centric, normalised data apps have had their day (they made sense in the 1970s when hard disk space was expensive) and all apps should be moving toward more modern practices. In reality, only when legacy systems are crawling on their knees, will stakeholders usually put up the cash to look for alternatives (usually after stuffing every last server with RAM). It might be possible or best to convince them to refactor small sections at a time.

DDD: Confusion about repository/domain boundaries

My domain consists of Products, Departments, Classes, Manufacturers, DailySales, HourlySales.
I have a ProductRepository which facilitates storing/retrieving products from storage.
I have a DepartmentAndClass repository which facilitates storing/retrieving of departments and classes, as well as adding and removing products from those departments and classes.
I also have a DailySales repository which I use to retrieve statistics about daily sales from multiple groupings. ie..
DailySales.GetSalesByDepartment(dateTime)
DailySales.GetSalesByClass(dateTime)
DailySales.GetSalesByHour(dateTime)
Is it correct to have these sales tracking methods in their own repository like this? Am I on the right track?

Since domains are so dependent on context some answers are harder than others. I would, however, place statistics on the Query side of things. You probably do not want to be calculating those stats on the fly as you will be placing some heavy processing on your database. Typically the stats should be denormalized for quick access where only filtering is required.
You may want to take a look at CQRS if you haven't done so.

Although most queries return an object or a collection of objects, it also fits within the concept to return some types of summary calculations, such as an object count, or a sum of a numerical attribute that was intended by the model to be tallied.
Eric Evans - Domain-Driven Design
This might be considered a read model. Are these daily sales objects being used in any domain model behaviour? Does any business logic depend on them? If not, it might be a good idea to separate this out into a distinct read model - at which point you're taking your first steps into CQRS.

Can't help but see Domain entities as wasteful. Why?

I've got a question on my mind that has been stirring for months as I've read about DDD, patterns and many other topics of application architecture. I'm going to frame this in terms of an MVC web application but the question is, I'm sure, much broader. and it is this:  Does the adherence to domain entities  create rigidity and inefficiency in an application? 
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application. Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do. And once that is done the extraneous information either gets passed to the view or there is the overhead of discarding, merging and mapping data to a DTO or view model. I have need to generate a lot of reports and the problem seems magnified there. Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities. It seems wasteful, honestly, and I don't want to pound a database and generate unneeded network traffic on a matter of principle. From questions like this Should the repository layer return data-transfer-objects (DTO)? it seems I'm not the only one to struggle with this question. So what's the answer to the limitations it seems to impose? 
Thanks from a new and confounded DDD-er.

What's the real problem here? Processing business rules and querying for data are 2 very different concerns. That realization leads us to CQRS - Command-Query Responsibility Segregation. What's that? You just don't use the same model for both tasks: Domain Model is about behavior, performing business processes, handling command. And there is a separate Reporting Model used for display. In general, it can contain a table per view. These tables contains only relevant information so you can get rid of DTO, AutoMapper, etc.
How these two models synchronize? It can be done in many ways:
Reporting model can be built just on top of database views
Database replication
Domain model can issue events containing information about each change and they can be handled by denormalizers updating proper tables in Reporting Model

as I've read about DDD, patterns and many other topics of application architecture
Domain driven design is not about patterns and architecture but about designing your code according to business domain. Instead of thinking about repositories and layers, think about problem you are trying to solve. Simplest way to "start rehabilitation" would be to rename ProductRepository to just Products.
Does the adherence to domain entities create rigidity and inefficiency in an application?
Inefficiency comes from bad modeling. [citation needed]
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application.
Tiers aren't layers
Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do.
Query that data as you wish. Do not try to box your problems into some "ready-made solutions". Instead - learn from them and apply only what's necessary to solve them.
Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
http://ayende.com/blog/3955/repository-is-the-new-singleton
So what's the answer to the limitations it seems to impose?
"seems"
Btw, internet is full of things like this (I mean that sample app).
To understand what DDD is, read blue book slowly and carefully. Twice.

If you think that fully fledged DDD is too much effort for your scenario then maybe you need to take a step down and look at something closer to Active Record.
I use DDD but in my scenario I have to support multiple front-ends; a couple web sites and a WinForms app, as well as a set of services that allow interaction with other automated processes. In this case, the extra complexity is worth it. I use DTO's to transfer a representation of my data to the various presentation layers. The CPU overhead in mapping domain entities to DTO's is small - a rounding error when compared to net work calls and database calls. There is also the overhead in managing this complexity. I have mitigated this to some extent by using AutoMapper. My Repositories return fully populated domain objects. My service layer will map to/from DTO's. Here we can flatten out the domain objects, combine domain objects, etc. to produce a more tabulated representation of the data.
Dino Esposito wrote an MSDN Magazine article on this subject here - you may find this interesting.
So, I guess to answer your "Why" question - as usual, it depends on your context. DDD maybe too much effort. In which case do something simpler.

Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
Add methods to your repository to return ONLY what you want e.g. IOrderRepository.GetByCustomer
It's completely OK in DDD.
You may also use Query object pattern or Specification to make your repositories more generic; only remember not to use anything which is ORM-specific in interfaces of the repositories(e.g. ICriteria of NHibernate)

In DDD, are collection properties of entities allowed to have partial values?

In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?

I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.

Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string