Lets assume we have separate read models so that we use repositories only on write side, not on read side. (and this question is all about write side)
Also, lets assume one of our goals is to have full Aggregate/Entity encapsulation in a way that they expose only behaviour and not state. (meaning, no getters/setters)
Next, lets assume we dont want to use ORMs nor reflection for getting aggregate internals from within repository.
Obviously, one of the solutions left is that aggregate itself exposes its full state as a VO/poco and hand it over to repository. Repository would then know how to map it to DAO and save it.
Now, on many places i read that repositories should accept aggregates themselves.
Question is: why? What is wrong with aggregate expose VO of its state, hand it to repository and let repo do the save based on it?
Benefit is full & clear aggregate encapsulation.
What is wrong with aggregate expose VO of its state, hand it to repository and let repo do the save based on it?
That approach violates some of the other assumptions of the domain modeling idea.
One of those assumptions is that the key business logic and the "plumbing" should be separate. The fact that you need to get a serializable representation of the aggregate is an accident of the fact that you are trying to write state into a stable storage (that step would be completely unnecessary if you were keeping the aggregate cached in memory).
A second assumption is that using "objects" to model the domain is a good idea. If you were using a functional approach, then of course you would be saving values, rather than objects.
It helps, I think, to keep in mind that the context of the solutions we write has changed a lot in the last 15-20 years. Patterns that make sense for a localized application with long sessions that include many interactions don't necessarily make sense for distributed stateless applications, and vice versa.
that pattern cause you to sacrifice aggregate encapsulation
No, it doesn't - asking an object for a copy of some information in a previously agreed upon representation doesn't violate encapsulation. See Kevlin Henney, for example.
What it does do, however, is require interfaces suitable for two different collaborators: your applications, that care about the business rules, and your repository, that cares about persistence of representations.
Related
I am trying to learn about details of domain driven design and i came to this question.
I found many examples with:
Defining repository interface within the domain
Defining repository implementation somewhere else
Inject repository interface into aggregate root
On the other hand, there are examples that strictly go against it and do all repository related stuff from service.
I cannot find authoritive answer and explanation: is it considered a bad practice and if so - why?
I cannot find authoritive answer and explanation: is it considered a bad practice and if so - why?
Yes, mostly because a number of different concerns get confused.
Aggregates define a consistency boundary; any change of state should be restricted to a collection of related entities that are all part of the same aggregate. So once you have looked up the first one (the "root" entity), you should be able to achieve the change without needing to examine any data other than this graph of domain entities and the arguments that you have been passed.
Put another way, Repository is a plumbing concern, not a domain concern. Your domain entities are responsible for expressing your domain, without infrastructure concerns getting mixed in.
To have, for example, Aggregate.Save() method that would inside use IRepository interface to save.
Aggregate.Save() definitely indicates a problem. Well, two problems, to be precise: Save probably isn't part of your ubiquitous language, and for that matter its likely that Aggregate isn't either.
Save isn't a domain concern, it's a persistence concern - it just copies data from your in memory representation (volatile storage) to a durable representation (stable storage). Your domain model shouldn't need to know anything about that.
One of the reasons you found "many examples with" is that getting these concerns separated correctly is hard; meaning you really need to think deeply about the problem to tease them apart. Most examples don't, because teasing things apart isn't critical when you always deploy everything together.
We all heard that injecting repository into aggregate is a bad idea, but almost no one tells why.
I will try to write here all disadvantages of doing this, so we can measure rightness of this statement.
First thing that comes into my head is Single Responsibility Principle.
It's true that by injecting repository into AR we are violating SRP, because retrieving and persisting of aggregate is not responsibility of aggregate itself. But it says only about "aggregate itself", not about other aggregates. So does it apply for retrieving from repository aggregates referenced by id? And what about storing them?
I used to think that aggregate shouldn't even know that there is some sort of persistence in system, because it doesn't have to exist. Aggregates can be created just for one procedure call and then get rid of.
Now when I think of it, it's not right, because aggregate root is an entity, and entity has sense only if it has some unique identity. So why would we need unique identity if not for persisting? Even if it's just a persistence in a memory. Maybe for comparing, but in my opinion it's not a main reason behind the identity.
Ok, let's assume that we retrieve and store OTHER aggregates from inside of our aggregate using injected repositories. What are other consequences beside SRP violation?
For sure there is a problem with having no control over persisting of aggregates and retrieving is some kind of lazy loading, which is bad for the same reason (no control).
Because of no control we can come into situation when we persist the same aggregate few times, where it could be persisted only once, or the same aggregate is loaded one hundred times where it could be loaded once, hence performance is worse. Also there might be problem with stale data.
These reasons practically disqualifies ability to inject repository into aggregate.
Here comes my main question - why can we inject repositories into domain service then?
Not the same reasons applies here? It's just like moving logic out of aggregate into separate function and pretend it to be something different.
To be honest, when I stared to write this SO question, I had no good answer for that. But after hours of investigating this problem and writing of this question I came to solution. Rubber duck debugging.
I'll post this question anyway for others having the same problems. Of course with my answer below.
Here are the places where I'd recommend to fetch aggregates (i.e. call Repository.Get...()), in preference order :
Application Service
Domain Service
Aggregate
We don't want Aggregates to fetch other Aggregates most of the time, because this blurs the lines, giving them orchestration powers which normally belong to the Application layer. You also raise the risk of the Aggregate trespassing its jurisdiction by modifying other Aggregates, which can result in contention and performance problems, not to mention that transactions become more difficult to analyze and the code base to reason about.
Domain Services are IMO a good place to fetch Aggregates when determining which aggregates to modify is domain logic per se. In your game example (which might not be the ideal context for DDD by the way), which units are affected by another unit's attack might be considered domain logic, thus you may not want to place it at the Application Service level. This rarely happens in my experience though.
Finally, Application Services are the default place where I call Repository.Get(...) for uniformity's sake and because this is the natural place to get a hold of the actors of the use case (usually only one Aggregate per transaction) and orchestrate calls to them.
That doesn't mean Aggregates should never be injected Repositories, there are exceptions, but other alternatives are almost always better.
So as I wrote in a question, I've found my answer already in the process of writing that question.
The best way to show this is by example:
When we have a simple (superficially) behavior like unit attacking other unit, we can write something like that.
unit.attack_unit(other_unit)
Problem is that, to attack an unit, we have to calculate damage and to do that we need another aggregates, like weapon and armor, which are referenced by id inside of unit. Since we cannot inject repository inside of aggregate, then we have to move that attack_unit logic into domain service, because we can inject repository there. Now where is the difference between injecting it into domain service, and not into unit aggregate.
Answer is - there is no difference. All consequences I described in question won't bite us. In both cases we will load both units once, attacking unit weapon once and armor of unit being attacked once. Also there won't be stale data, even if we mutate weapon object during process and store it, because that weapon is retrieved and stored in one place.
Problem shows up in different example.
Lets create an use case where unit can attack all other units in game in one process.
Problem lies in how we implement it. If we will use already defined unit.attack_unit and we will call it on all units in game (iterating over them), then weapon that is used to compute damage will be retrieved from unit aggregate, number of times equal to count of units in game! But it could be retrieved only once!
It doesn't matter if unit.attack_unit will be method of unit aggregate, or if it will be domain service unit_attack_unit. It will be still the same, weapon will be loaded too many times. To fix that we simply have to change implementation and with that probably interface too.
Now at least we have an answer to question "does moving logic from aggregate method to domain service (because we want to access repository there) fixes problem?". No, it does not change a thing.
Injecting repositories into domain service can be as dangerous as injecting it into aggregate if used wrong.
This answers my SO question, but we still don't have solution to real problem.
What can we do if we have two use cases: one where unit attacks one other unit, and second where unit attacks all other units, without duplicating domain logic.
One way is to put all needed aggregates as parameters to our aggregate method.
unit.attack_unit(unit, weapon, armor)
But what if we will need like five or more aggregates there? It's not a good way. Also application logic will have to know that all these aggregates are needed for an attack, which is knowledge leak. When attack_unit implementation will change we would also might to update interface of that method. What is the purpose of encapsulation then?
So, if we can't access repository to get needed aggregate, how can we smuggle it then?
We can get rid of idea with referencing aggregates by ids, or pass all needed aggregates from application layer (which means knowledge leak).
Or maybe reason of these problems is bad modelling?
Attacking of other unit is indeed an unit responsibility, but is damage calculation its responsibility? Of course not.
Maybe we need another object, like value object MeleeAttack(weapon, armor), yet when we add more properties that can change result of an attack, like enchantments on unit, it gets more complicated.
Also I think that we are now creating objects based on performance, not our on domain.
So from domain driven design, we get performance driven design. Is that what we want? I don't think so.
"So why would we need unique identity if not for persisting?" - think of an account scenario, where several John Smiths exist in your system. Imagine John Smith and John Smith Jr (who didn't enter the Jr in signup) both live at the same address. How do you tell them apart? Imagine I'm trying to write a recommendation engine based upon their past purchases . . . .
Identity is a quality of equality in DDD. If you don't have an identity unique from your fields, then you're a ValueObject.
What are consequences of using repository inside of aggregate vs inside of domain service?
There's a reasonably strong argument that you shouldn't do either.
Riddle: when does an aggregate need to see the state of another aggregate?
The responsibility of an aggregate is to control change. Any command that would change the state of the domain model is dispatched to the aggregate root responsible for the integrity of the state in question. By definition, all of the state required to ensure that the command is currently permitted is contained within the aggregate boundary.
So there is never any need to peek at the data outside of the aggregate when making a change to the model.
In which case, you don't ever need to load another aggregate, which makes the "where" question moot.
Two clarifications:
Queries will often combine the state of multiple aggregates, and will often need to follow a reference from one aggregate to another. The principle above is satisfied because queries treat the domain model as read-only. You need the state to answer the query, but you don't need the invariant enforcement because you aren't changing anything.
Another case is when you need state from another aggregate to process a command properly, but small latency in the data is an acceptable risk to the data. In that case, you query the "other" aggregate to get state. If you were to run that query within the domain model itself, the right way to do so would be via a domain service.
In most cases, though, you'll be equally well served to run the query when generating the command (ie, in the client), or when handling the command (in the application, outside the domain). It would be very unusual for a business to consider domain service latency to be acceptable but client latency to be unacceptable.
(Disconnected clients are one case where that can be especially problematic; when the command is generated and then queued for a long period of time before being dispatched to the server).
For example take an order entity. It's obvious that order lines don't exist without order. So we have to get them with the help of OrderRepository(throw an order entity). Ok. But what about other things that are loosely coupled with order? Should the customer info be available only from CustomerRepo and bank requisites of the seller available from BankRequisitesRepo, etc.? If it is correct, we should pass all these repositories to our Create Factory method I think.
Yes. In general, each major entity (aggregate root in domain driven design terminology) should have their own repositories. Child entities *like order lines) will generally not need one.
And yes. Define each repository as a service then inject them where needed.
You can even design things such that there is no direct coupling between Order and Customer in terms of an actual database link. This in turn allows customers and orders to live in completely independent databases. Which may or may not be useful for your applications.
You correctly understood that aggregate roots's (AR) child entities shall not have their own repository, unless they are themselves AR's. Having repositories for non-ARs would leave your invariants unprotected.
However you must also understand that entities should usually not be clustered together for convenience or just because the business states that some entity has one or many some other entity.
I strongly recommend that you read Effective Aggregate Design by Vaughn Vernon and this other blog post that Vaughn kindly wrote for a question I asked.
One of the aggregate design rule of thumb stated in Effective Aggregate Design is that you should usually reference other aggregates by identity only.
Therefore, you greatly reduce the number of AR instances needed in other AR's creationnal process since new Order(customer, ...) could become new Order(customerId, ...).
If you still find the need to query other AR's in one AR's creationnal process, then there's nothing wrong in injecting repositories as dependencies, but you should not depend on more than you need (e.g. let the client resolve the real dependencies and pass them directly rather than passing in a strategy allowing to resolve a dependency).
After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?
CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.
DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.
Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);
I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)
Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.
The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.
DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?
I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.
Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern
You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.
You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.
Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.