Pros and cons of DDD Repositories - domain-driven-design

Pros:
Repositories hide complex queries.
Repository methods can be used as transaction boundaries.
ORM can easily be mocked
Cons:
ORM frameworks offer already a collection like interface to persistent objects, what is the intention of repositories. So repositories add extra complexity to the system.
combinatorial explosion when using findBy methods. These methods can be avoided with Criteria objects, queries or example objects. But to do that no repository is needed because a ORM already supports these ways to find objects.
Since repositories are a collection of aggregate roots (in the sense of DDD), one have to create and pass around aggregate roots even if only a child object is modified.
Questions:
What pros and cons do you know?
Would you recommend to use repositories? (Why or why not?)

The main point of a repository (as in Single Responsibility Principle) is to abstract the concept of getting objects that have identity. As I've become more comfortable with DDD, I haven't found it useful to think about repositories as being mainly focused on data persistence but instead as factories that instantiate objects and persist their identity.
When you're using an ORM you should be using their API in as limited a way as possible, giving yourself a facade perhaps that is domain specific. So regardless your domain would still just see a repository. The fact that it has an ORM on the other side is an "implementation detail".

Repository brings domain model into focus by hiding data access details behind an interface that is based on ubiquitous language. When designing repository you concentrate on domain concepts, not on data access. From the DDD perspective, using ORM API directly is equivalent to using SQL directly.
This is how repository may look like in the order processing application:
List<Order> myOrders = Orders.FindPending()
Note that there are no data access terms like 'Criteria' or 'Query'. Internally 'FindPending' method may be implemented using Hibernate Criteria or HQL but this has nothing to do with DDD.
Method explosion is a valid concern. For example you may end up with multiple methods like:
Orders.FindPending()
Orders.FindPendingByDate(DateTime from, DateTime to)
Orders.FindPendingByAmount(Money amount)
Orders.FindShipped()
Orders.FindShippedOn(DateTime shippedDate)
etc
This can improved by using Specification pattern. For example you can have a class
class PendingOrderSpecification{
PendingOrderSpecification WithAmount(Money amount);
PendingOrderSpecification WithDate(DateTime from, DateTime to)
...
}
So that repository will look like this:
Orders.FindSatisfying(PendingOrderSpecification pendingSpec)
Orders.FindSatisfying(ShippedOrderSpecification shippedSpec)
Another option is to have separate repository for Pending and Shipped orders.

A repository is really just a layer of abstraction, like an interface. You use it when you want to decouple your data persistence implementation (i.e. your database).
I suppose if you don't want to decouple your DAL, then you don't need a repository. But there are many benefits to doing so, such as testability.
Regarding the combinatorial explosion of "Find" methods: in .NET you can return an IQueryable instead of an IEnumerable, and allow the calling client to run a Linq query on it, instead of using a Find method. This provides flexibility for the client, but sacrifices the ability to provide a well-defined, testable interface. Essentially, you trade off one set of benefits for another.

Related

Separating business rules from entities in domain driven design

While i am practicing DDD in my software projects, i have always faced the question of "Why should i implement my business rules in the entities? aren't they supposed to be pure data models?"
Note that, from my understanding of DDD, domain models could be consist of persistent models as well as value objects.
I have come up with a solution in which i separate my persistent models from my domain models. On the other hand we have data transfer objects (DTO), so we have 3 layers of data mapping. Database to persistence model, persistence model to domain models and domain models to DTOs. In my opinion, my solution is not an efficient one as too much hard effort must be put into it.
Therefore is there any better practice to achieve this goal?
Disclaimer: this answer is a little larger that the question but it is needed to understand the problem; also is 100% based on my experience.
What you are feeling is normal, I had the same feeling some time ago. This is because of a combination of architecture, programming language and used framework. You should try to choose the above tools as such that they give the code that is easiest to change. If you have to change 3 classes for each field added to an entity then this would be nightmare in a large project (i.e. 50+ entity types).
The problem is that you have multiple DTOs per entity/concept.
The heaviest architecture that I used was the Classic layered architecture; the strict version was the hardest (in the strict version a layer may access only the layer that is just before it; i.e. the User interface may access only the Application). It involved a lot of DTOs and translations as the data moved from the Infrastructure to the UI. The testing was also hard as I had to use a lot of mocking.
Then I inverted the dependency, the Domain will not depend on the Infrastructure. For this I defined interfaces in the Domain layer that were implemented in the Infrastructure. But I still needed to use mocking for them. Also, the Aggregates were not pure and they had side effects (because they called the Infrastructure, even it was abstracted by interfaces).
Then I moved the Domain to the very bottom. This made my Aggregates pure. I no longer needed to use mocking. But I still needed DTOs (returned by the Application layer to the UI and those used by the ORM).
Then I made the first leap: CQRS. This splits the models in two: the write model and the read model. The important thing is that you don't need to use DTOs for models anymore. The Aggregate (the write model) can be serialized as it is or converted to JSON and stored in almost any database. Vaughn Vernon has a blog post about this.
But the nicest are the Read models. You can create a read model for each use case. Being a model used only for read/query, it can be as simple/dump as possible. The read entities contain only query related behavior. With the right persistence they can be persisted as they are. For example, if you use MongoDB (or any document database), with a simple reflection based serializer you can have a very thin architecture. Thanks to the domain events, you won't need to use JOINS, you can have full data denormalization (the read entities include all the data they need).
The second leap is Event sourcing. With this you don't need a flat persistence for the Aggregates. They are rehydrated from the Event store each time they handle a command.
You still have DTOs (commands, events, read models) but there is only one DTO per entity/concept.
Regarding the elimination of DTOs used by the Presentation: you can use something like GraphSQL.
All the above can be made worse by the programming language and framework. Strong typed programming languages force you to create a type for each custom returned value. Some frameworks force you to return a custom serializable type in order to return them to REST over HTTP requests (in this way you could have self-described REST endpoints using reflection). In PHP you can simply use arrays with string keys as value to be returned by a REST controller.
P.S.
By DTO I mean a class with data and no behavior.
I'm not saying that we all should use CQRS, just that you should know that it exists.
Why should i implement my business rules in the entities? aren't they supposed to be pure data models?
Your persistence entities should be pure data models. Your domain entities describe behaviors. They aren't the same thing; it is a common pattern to have a bit of logic with in the repository to change one to the other.
The cleanest way I know of to manage things is to treat the persistent entity as a value object to be managed by the domain entity, and to use something like a data mapper for transitions between domain and persistence.
On the other hand we have data transfer objects (DTO), so we have 3 layers of data mapping. Database to persistence model, persistence model to domain models and domain models to DTOs. In my opinion, my solution is not an efficient one as too much hard effort must be put into it.
cqrs offers some simplification here, based on the idea that if you are implementing a query, you don't really need the "domain model" because you aren't actually going to change the supporting data. In which case, you can take the "domain model" out of the loop altogether.
DDD and data are very different things. The aggregate's data (an outcome) will be persisted somehow depending on what you're using. Personally I think in domain events so the resulting Domain Event is the DTO (technically it is) that can be stored directly in an Event Store (if you're using Event Sourcing) or act as a data source for your persistence model.
A domain model represents relevant domain behaviour with the domain state being the 'result'. An entity is concept which has an id, compared to a Value Object which represents a business semantic value only. An entity usually groups related value objects and consistency rules. Not all business rules are here , some of them make sense as a service.
Now, there is the case of a CRUD domain or CRUD modelling where basically all you have is some data structures plus some validation rules. No need to complicate your life here if the modeling is correct. Implement things as simple as possible.
Always think of DDD as a methodology to gather requirements and to structure information. Implementation as in code (design) is something different.

DDD - collection of value objects

I have some middleware code which fetches a list of products from an external api. I am modelling the response and returning that response to clients of my code.
Any clients of my code do not care about specifics on individual products returned: they simply want the collection of products.
How would that be modelled using ddd?
Each product property a value object, a product an entity and a repository to contain all of the products?
Why not use CQRS (https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs).
Separate your models into read and write models. In your case read models will do. Make they POCOs. On the read side we do not need to use DDD tactical modeling tools.
For more info visit the link i provided.
I think you are almost there, your middleware(external api) could be a repository, by having find methods, and returning Product models.
It is recommended a repository be an interface (e.g. ProductRepository) for making the code more testable. You could have simple implementation for tests(e.g. ProductRepositoryTestImpl) and main implementation for middleware communication (e.g. ProductRepostioryImpl).
For packaging, I prefer this:
domain
\model
\product
|Product
|ProductRepository
infrastructure
\persistence
\YOUR_EXTERNAL_API_NAME
|ProductRepositoryImp
\eclipselink
...
\test
\...
|ProductRepositoryTestImpl
You should see the external api like an external bounded context. Your local bounded context will use an anti-corruption layer that translate terms from remote to local bounded context. So, your code is in fact an anti-corruption layer.
Now, should you persist those products as entities or value objects? This depends on your local usage. Do you modify those products or not. If you don't modify them then they are Value objects.
In any case you probably will have to use a repository to persist/retreive the products.

How to retrieve Aggregate Roots that don't have repositories?

Eric Evan's DDD book, pg. 152:
Provide Repositories only for AGGREGATE roots that actually need
direct access.
1.
Should Aggregate Roots that don't need direct access be retrieved and saved via repositories of those Aggregate Roots that do need direct access?
For example, if we have Customer and Order Aggregate roots and if for whatever reason we don't need direct access to Order AR, then I assume only way orders can be obtained is by traversing Customer.Orders property?
2.
When should ICustomerRepository retrieve orders? When Customer AR is retrieved ( via ICustomerRepository.GetCustomer ) or when we traverse Customer.GetOrders property?
3.
Should ICustomerRepository itself retrieve orders or should it delegate this responsibility to a IOrderRepository? If the latter, then one option would be to inject IOrderRepository into ICustomerRepository. But since outside code shouldn't know that IOrderRepository even exists ( if outside code was aware of its existence, then it may also use IOrderRepository directly ), how then should ICustomerRepository get a reference to IOrderREpository?
UPDATE:
1
With regards to implementation, if done with an ORM like NHibernate,
there is no need for an IOrderRepository.
a) Are you saying that when using ORM, we usually don't need to implement repositories, since ORMs implicitly provide them?
b) I do plan on learning one of ORM technologies ( probably EF ), but from little I did read on ORMs, it seems that if you want to completely decouple Domain or Application layers from Persistence layer, then these two layers shouldn't use ORM expressions, which also implies that ORM expressions and POCOs should exist only within Repository implementations?
c) If there is a scenario where for some reason AR root doesn't have a direct access ( and project doesn't use ORM ), what would your answer to 3. be?
thanks
I'm hard-pressed to think of an example where an aggregate does not require direct access. However, I think at the time of writing (circa 2003), the emphasis on limiting or eliminating traversable object references between aggregates wasn't as prevalent as it is today. Therefore, it could have been the case that a Customer aggregate would reference a collection of Order aggregates. In this scenario, there may be no need to reference an Order directly because traversal from Customer is acceptable.
With regards to implementation, if done with an ORM like NHibernate, there is no need for an IOrderRepository. The Order aggregate would simply have a mapping. Additionally, the mapping for Customer would specify that changes should cascade down to corresponding Order aggregates.
When should ICustomerRepository retrieve orders?
This is the question which raises concern over traversable object references between aggregates. A solution provided by ORM is lazy loading, but lazy loading can be problematic. Ideally, a customer's orders would only be retrieved when needed and this depends on context. My suggestion, therefore, is to avoid traversable references between aggregates and use a repository search instead.
UPDATE
a) You would still need something that implements ICustomerRepository, but the implementation would be largely trivial if the mappings are configured - you'd delegate to the ORM's API to implement each repository method. No need for a IOrderRepository however.
b) For full encapsulation, the repository interface would not contain anything ORM-specific. The repository implementation would adapt the repository contract to ORM specifics.
c) Hard to make a judgement on a scenario I can't picture, but it would seem there is no need for Order repository interface, you can still have an Order Repository to better separate responsibilities. No need for injection either, just have the Customer repo implementation create an instance of Order repo.

How do you deal with DDD and EF4

I'm facing several problems trying to apply DDD with EF4 (in ASP MVC2 context). Your advaice would be greatly appreciated.
First of all, I started to use POCO because the dependacy on ObjectContext was not very comfortable in many situations.
Going to POCO solved some problems but the experience is not what I was used to with NHibernate.
I would like to know if it's possible to use designer and to generate not only entities but also a Value Objects (ComplexType?). If I mean Value Object is a class with one ctor without any set properties (T4 modification needed ?).
The only way I found to add behavior to anemic entities is to create partial classes that extends those generated by edmx. I'm not satisfied with this approach.
I don't know how to create several repositories with one edmx. For now I'm using a partial classes to group methods for each aggregate. Each group is a repository in fact.
The last question is about IQueryable. Should it be exposed outside the repository ? If I refer to the ble book, the repository should be a unit of execution and shouldn't expose something like IQueryable. What do you think ?
Thanks for your help.
Thomas
It's fine to use POCOs, but note that EntityObject doesn't require an ObjectContext.
Yes, Complex Types are value objects and yes, you can generate them in the designer. Select several properties of an entity, right click, and choose refactor into complex type.
I strongly recommend putting business methods in their own types, not on entities. "Anemic" types can be a problem if you must maintain them, but when they're codegened they're hardly a maintenance problem. Making business logic separate from entity types allows your business rules and your data model to evolve independently. Yes, you must use partial classes if you must mix these concerns, but I don't believe that separating your model and your rules is a bad thing.
I think that repositories should expose IQueryable, but you can make a good case that domain services should not. People often try to build their repositories into domain services, but remember that the repository exists only to abstract away persistence. Concerns like security should be in domain services, and you can make the case that having IQueryable there gives too much power to the consumer.
I think it's OK to expose IQueryable outside of the repository, only because not doing so could be unnecessarily restrictive. If you only expose data via methods like GetPeopleByBirthday and GetPeopleByLastName, what happens when somebody goes to search for a person by last name and birthday? Do you pull in all the people with the last name "Smith" and do a linear search for the birthday you want, or do you create a new method GetPeopleByBirthdayAndLastName? What about the poor hapless fellow who has to implement a QBE form?
Back when the only way to make ad hoc queries against the domain was to generate SQL, the only way to keep yourself safe was to offer just specific methods to retrieve and change data. Now that we have LINQ, though, there's no reason to keep the handcuffs on. Anybody can submit a query and you can execute it safely without concern.
Of course, you could be concerned that a user might be able to view another's data, but that's easy to mitigate because you can restrict what data you give out. For example:
public IQueryable<Content> Content
{
get { return Content.Where(c => c.UserId == this.UserId); }
}
This will make sure that the only Content rows that the user can get are those that have his UserId.
If your concern is the load on the database, you could do things like examine query expressions for table scans (accessing tables without Where clauses or with no indexed columns in the Where clause). Granted, that's non-trivial, and I wouldn't recommend it.
It's been some time since I asked that question and had a chance to do it on my own.
I don't think it's a good practice to expose IQueryable at all outside the DAL layer. It brings more problems that it solves. I'm talking about large MVC applications. First of all the refactorings is harder, many developers user IQueryable instances from the views and after struggle with the fact that when resolving IQueryable the connection was already disposed. Performance problems because all the database is often queried for a given set of resultats and so on.
I rather expose Ienumerable from my repositories and believe me, it saves me many troubles.

data access in DDD?

After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?
CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.
DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.
Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);
I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)
Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.
The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.

Resources