In DDD, are collection properties of entities allowed to have partial values? - domain-driven-design

In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?

I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.

Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

Separating business rules from entities in domain driven design

While i am practicing DDD in my software projects, i have always faced the question of "Why should i implement my business rules in the entities? aren't they supposed to be pure data models?"
Note that, from my understanding of DDD, domain models could be consist of persistent models as well as value objects.
I have come up with a solution in which i separate my persistent models from my domain models. On the other hand we have data transfer objects (DTO), so we have 3 layers of data mapping. Database to persistence model, persistence model to domain models and domain models to DTOs. In my opinion, my solution is not an efficient one as too much hard effort must be put into it.
Therefore is there any better practice to achieve this goal?
Disclaimer: this answer is a little larger that the question but it is needed to understand the problem; also is 100% based on my experience.
What you are feeling is normal, I had the same feeling some time ago. This is because of a combination of architecture, programming language and used framework. You should try to choose the above tools as such that they give the code that is easiest to change. If you have to change 3 classes for each field added to an entity then this would be nightmare in a large project (i.e. 50+ entity types).
The problem is that you have multiple DTOs per entity/concept.
The heaviest architecture that I used was the Classic layered architecture; the strict version was the hardest (in the strict version a layer may access only the layer that is just before it; i.e. the User interface may access only the Application). It involved a lot of DTOs and translations as the data moved from the Infrastructure to the UI. The testing was also hard as I had to use a lot of mocking.
Then I inverted the dependency, the Domain will not depend on the Infrastructure. For this I defined interfaces in the Domain layer that were implemented in the Infrastructure. But I still needed to use mocking for them. Also, the Aggregates were not pure and they had side effects (because they called the Infrastructure, even it was abstracted by interfaces).
Then I moved the Domain to the very bottom. This made my Aggregates pure. I no longer needed to use mocking. But I still needed DTOs (returned by the Application layer to the UI and those used by the ORM).
Then I made the first leap: CQRS. This splits the models in two: the write model and the read model. The important thing is that you don't need to use DTOs for models anymore. The Aggregate (the write model) can be serialized as it is or converted to JSON and stored in almost any database. Vaughn Vernon has a blog post about this.
But the nicest are the Read models. You can create a read model for each use case. Being a model used only for read/query, it can be as simple/dump as possible. The read entities contain only query related behavior. With the right persistence they can be persisted as they are. For example, if you use MongoDB (or any document database), with a simple reflection based serializer you can have a very thin architecture. Thanks to the domain events, you won't need to use JOINS, you can have full data denormalization (the read entities include all the data they need).
The second leap is Event sourcing. With this you don't need a flat persistence for the Aggregates. They are rehydrated from the Event store each time they handle a command.
You still have DTOs (commands, events, read models) but there is only one DTO per entity/concept.
Regarding the elimination of DTOs used by the Presentation: you can use something like GraphSQL.
All the above can be made worse by the programming language and framework. Strong typed programming languages force you to create a type for each custom returned value. Some frameworks force you to return a custom serializable type in order to return them to REST over HTTP requests (in this way you could have self-described REST endpoints using reflection). In PHP you can simply use arrays with string keys as value to be returned by a REST controller.
P.S.
By DTO I mean a class with data and no behavior.
I'm not saying that we all should use CQRS, just that you should know that it exists.
Why should i implement my business rules in the entities? aren't they supposed to be pure data models?
Your persistence entities should be pure data models. Your domain entities describe behaviors. They aren't the same thing; it is a common pattern to have a bit of logic with in the repository to change one to the other.
The cleanest way I know of to manage things is to treat the persistent entity as a value object to be managed by the domain entity, and to use something like a data mapper for transitions between domain and persistence.
On the other hand we have data transfer objects (DTO), so we have 3 layers of data mapping. Database to persistence model, persistence model to domain models and domain models to DTOs. In my opinion, my solution is not an efficient one as too much hard effort must be put into it.
cqrs offers some simplification here, based on the idea that if you are implementing a query, you don't really need the "domain model" because you aren't actually going to change the supporting data. In which case, you can take the "domain model" out of the loop altogether.
DDD and data are very different things. The aggregate's data (an outcome) will be persisted somehow depending on what you're using. Personally I think in domain events so the resulting Domain Event is the DTO (technically it is) that can be stored directly in an Event Store (if you're using Event Sourcing) or act as a data source for your persistence model.
A domain model represents relevant domain behaviour with the domain state being the 'result'. An entity is concept which has an id, compared to a Value Object which represents a business semantic value only. An entity usually groups related value objects and consistency rules. Not all business rules are here , some of them make sense as a service.
Now, there is the case of a CRUD domain or CRUD modelling where basically all you have is some data structures plus some validation rules. No need to complicate your life here if the modeling is correct. Implement things as simple as possible.
Always think of DDD as a methodology to gather requirements and to structure information. Implementation as in code (design) is something different.

DDD: Confusion about repository/domain boundaries

My domain consists of Products, Departments, Classes, Manufacturers, DailySales, HourlySales.
I have a ProductRepository which facilitates storing/retrieving products from storage.
I have a DepartmentAndClass repository which facilitates storing/retrieving of departments and classes, as well as adding and removing products from those departments and classes.
I also have a DailySales repository which I use to retrieve statistics about daily sales from multiple groupings. ie..
DailySales.GetSalesByDepartment(dateTime)
DailySales.GetSalesByClass(dateTime)
DailySales.GetSalesByHour(dateTime)
Is it correct to have these sales tracking methods in their own repository like this? Am I on the right track?
Since domains are so dependent on context some answers are harder than others. I would, however, place statistics on the Query side of things. You probably do not want to be calculating those stats on the fly as you will be placing some heavy processing on your database. Typically the stats should be denormalized for quick access where only filtering is required.
You may want to take a look at CQRS if you haven't done so.
Although most queries return an object or a collection of objects, it also fits within the concept to return some types of summary calculations, such as an object count, or a sum of a numerical attribute that was intended by the model to be tallied.
Eric Evans - Domain-Driven Design
This might be considered a read model. Are these daily sales objects being used in any domain model behaviour? Does any business logic depend on them? If not, it might be a good idea to separate this out into a distinct read model - at which point you're taking your first steps into CQRS.

Can't help but see Domain entities as wasteful. Why?

I've got a question on my mind that has been stirring for months as I've read about DDD, patterns and many other topics of application architecture. I'm going to frame this in terms of an MVC web application but the question is, I'm sure, much broader. and it is this:  Does the adherence to domain entities  create rigidity and inefficiency in an application? 
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application. Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do. And once that is done the extraneous information either gets passed to the view or there is the overhead of discarding, merging and mapping data to a DTO or view model. I have need to generate a lot of reports and the problem seems magnified there. Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities. It seems wasteful, honestly, and I don't want to pound a database and generate unneeded network traffic on a matter of principle. From questions like this Should the repository layer return data-transfer-objects (DTO)? it seems I'm not the only one to struggle with this question. So what's the answer to the limitations it seems to impose? 
Thanks from a new and confounded DDD-er.  
What's the real problem here? Processing business rules and querying for data are 2 very different concerns. That realization leads us to CQRS - Command-Query Responsibility Segregation. What's that? You just don't use the same model for both tasks: Domain Model is about behavior, performing business processes, handling command. And there is a separate Reporting Model used for display. In general, it can contain a table per view. These tables contains only relevant information so you can get rid of DTO, AutoMapper, etc.
How these two models synchronize? It can be done in many ways:
Reporting model can be built just on top of database views
Database replication
Domain model can issue events containing information about each change and they can be handled by denormalizers updating proper tables in Reporting Model
as I've read about DDD, patterns and many other topics of application architecture
Domain driven design is not about patterns and architecture but about designing your code according to business domain. Instead of thinking about repositories and layers, think about problem you are trying to solve. Simplest way to "start rehabilitation" would be to rename ProductRepository to just Products.
Does the adherence to domain entities create rigidity and inefficiency in an application?
Inefficiency comes from bad modeling. [citation needed]
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application.
Tiers aren't layers
Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do.
Query that data as you wish. Do not try to box your problems into some "ready-made solutions". Instead - learn from them and apply only what's necessary to solve them.
Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
http://ayende.com/blog/3955/repository-is-the-new-singleton
So what's the answer to the limitations it seems to impose?
"seems"
Btw, internet is full of things like this (I mean that sample app).
To understand what DDD is, read blue book slowly and carefully. Twice.
If you think that fully fledged DDD is too much effort for your scenario then maybe you need to take a step down and look at something closer to Active Record.
I use DDD but in my scenario I have to support multiple front-ends; a couple web sites and a WinForms app, as well as a set of services that allow interaction with other automated processes. In this case, the extra complexity is worth it. I use DTO's to transfer a representation of my data to the various presentation layers. The CPU overhead in mapping domain entities to DTO's is small - a rounding error when compared to net work calls and database calls. There is also the overhead in managing this complexity. I have mitigated this to some extent by using AutoMapper. My Repositories return fully populated domain objects. My service layer will map to/from DTO's. Here we can flatten out the domain objects, combine domain objects, etc. to produce a more tabulated representation of the data.
Dino Esposito wrote an MSDN Magazine article on this subject here - you may find this interesting.
So, I guess to answer your "Why" question - as usual, it depends on your context. DDD maybe too much effort. In which case do something simpler.
Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
Add methods to your repository to return ONLY what you want e.g. IOrderRepository.GetByCustomer
It's completely OK in DDD.
You may also use Query object pattern or Specification to make your repositories more generic; only remember not to use anything which is ORM-specific in interfaces of the repositories(e.g. ICriteria of NHibernate)

Breaking up Entities into smaller Entities in DDD?

Does it make sense to make subsets of an Entity if you consider their usage in the application differently? IE. I take my entity and define a new entity with only some of the attributes of the first. Now I have 2 Entities that overlap but are used differently but ultimately persist in same datatable. These Entities will be accessed through different repositories...
I am only starting to learn about DDD myself, so if I am wrong please comment and let me know. Here are my thoughts though:
If the entity is going to be accessed through a different repository, I think it deserves its own class. Additionally, the bits that overlap now may not overlap in the future, and if you use a shared base class, you will probably be more likely to try adapting things at that point, which will dirty up your domain.
If the two classes are part of separate sub-domains, they probably should be separate. My thoughts are based around parts of an example I remember hearing in Rob Connery's interview on Hanselminutes. A product has several properties that are important to consumers (pricing, description, etc), and several properties that are important to warehouse personnel (location in the warehouse, weight, dimensions, etc). The implication to me in that episode was that the two products should be defined separately in the domain, instead of being defined once and shared.
If by "usage in the application" you mean you'll display different parts in different views, then I'd suggest you use a presentation pattern like Fowler describes in his Presentation Model (or if your developing a WPF app you can use the more WPF-specialized version called Model-View-ViewModel (MVVM)).
But if you by "usage" mean that you'll use different attributes of the entity in different sub-domains or part of your domain, then I agree with Chris; You'd probably be better off breaking them into different entities. The reason being that in your domain model you should reflect how the entity is used in that specific (sub)domain. And if you're using different parts of the entity under different circumstances, they probably have different meanings in these settings which again should be reflected in the naming of the entities. And if it were me, I'd probably make one repository for each of the entities. Having a 1:1 mapping between entities and repositories seems to make sense in most cases as far as I've experienced. But then again; just like Chris, Rob Conery and 90% of the developers trying to do DDD; I'm fairly new to the DDD-game and so my experiences might be overruled by someone more experienced :)

Resources