Looking for some clarification on working with aggregate roots.
If I have a model (a question paper) as follows;
QUESTION PAPER ---> QUESTION ---> ANSWER
and I have identified that the QUESTION PAPER is an aggregate root, if I want to select a answer for a question do I have to put a public method on the aggregate root or can I expose the questions from the root and put a public method on the QUESTION object to select a ANSWER??
In general you always want to be talking to your aggregate root. If you're reading values then sometimes it can be convenient to add public accessors to aggregates inside the aggregate root, but it gets ugly (Law of Demeter, breaking abstractions, etc, etc) very quickly and I would suggest that you don't do it.
For anything that changes state, however, it's critical that you always go through the aggregate root. The aggregate root represents a consistency boundary (i.e. it is responsible, either directly or indirectly, for keeping things in a valid state) and if you allow state changes you bypass this altogether, opening the door to ever increasing complexity.
So, it depends what you mean by 'select' in your question - if you're querying then you can get away with it, but it's a bad idea. If you are changing state then don't do it, or your aggregate root is no longer an aggregate root.
Related
I have a question regarding the design of aggregates, as presented by Vernon, both in his articles in the DDD community (Effective Aggregate Design, Part 3) as well as on his book (Implementing Domain-Driven Design).
In there, he explores two possible approaches when designing BacklogItem and Task. In one of them Task is not its own Aggregate Root because it runs the "risk of leaving the true invariant unprotected", and the aggregate root is the BacklogItem
However, one of the other guidelines for designing aggregates roots is that access to an entity should only be done through the root itself. Which means that in order to get access to a Task in this approach one would have to now the BacklogItem it belongs to and ask for the backlog item. Normally, though one would just want to see the Tasks is assigned with, and not the backlog item.
In this case we will need to access the entity directly and not via the Backlog item. How does this sit with the proposed design? (I understand that this may be just an educational demo but how would someone have to think this if this was real life?)
Thanks in advance for any answers
In this case we will need to access the entity directly and not via the Backlog item. How does this sit with the proposed design?
That depends: "access directly" is under specified.
Bertrand Mayer's language of command query separation helps here. Queries leave the model unchanged; commands update the model.
Here's the key idea: the unique concern of the aggregate root is commands; any change to the state of the aggregate is achieved by sending a command to the root entity, which may at its discretion delegate the responsibility of changing the state to some other entity.
So if you are accessing the Task to query its current state, that's just fine. But getting the Task so that you can send commands to it directly? that breaks the rules. The aggregate root has the privilege of exclusive access to the commands of all of the entities within the aggregate.
The implication is that you would never invoke Task.estimateRemainingHours() directly; you would instead invoke some analogous method on the aggregate root. BacklogItem.logEstimateFromTeamMember(), perhaps, which would in turn decide which tasks need to be updated.
We have an aggregate root in our system and is has child entities in a collection. The problem is that the container needs to be updated very frequently, on a transaction basis, and the children entities don't, they in fact hardly ever change, they are more configuration like in nature.
My first reflex was to separate them into two different aggregate roots because our of application requirements. But I was reminded of the cascade delete rule, if we delete the one then the delete should cascade, so their lifetimes are linked.
We stumbled over this problem when we discovered that we have a caching problem. Changes to the children entities (configuration) were not being reflected in the system at runtime because the parent was unaware of the changes (we had them as one aggregate root but someone had created a repository for its children).
The main driver for aggregate boundaries are the invariants of your domain - or in other terms, aggregate boundaries should be consistency boundaries. Things that must change together atomically must be in the same aggregate.
The cascading delete is (with regards to aggregate boundaries) rather a nice-to-have than a rule. You can always enforce the fact that a Parent still lives by requiring one at the place where you load Child entities. With this design, you can make Parent and Child different aggregates, while still enforcing the rule that no "free floating" Child aggregates can be requested. And deleting Child aggregates in response to a deleted Parent is easy if you have domain events in place.
Note: All this is under the assumption that your domain invariants allow separating the aggregates in the first place.
This might be better in a discussion format, rather than a Q&A format. I'd recommend trying the audience at DomainDrivenDesign or DDDCQRS
Are you sure that you have a business requirement to delete data in your domain model? That's really unusual -- in most domain models I've seen, an aggregate will reach an "end of life" state, (example: AccountClosed), but doesn't actually get removed from the system.
A common trap in aggregate design is to think about the structure of the entities. "A has a B" does not necessarily mean that they are part of the same aggregate; the key idea is "A needs to keep B and C consistent". You can think about it like a graph; state B and state C are nodes in the graph, the consistency rules are the edges. If you can't traverse the graph from B to C, then they don't need to be part of the same aggregate, and probably shouldn't be.
My instinct is that caching should be the right answer here. If you are processing millions of transactions per day, and the collection only changes once per month, then simply using a cached value of the collection should produce the right answer most of the time.
In this, I'm influenced by Udi Dahan's essay Race Conditions Don't Exist; by coupling this configuration collection with the rest of the aggregate, you are essentially asserting that changes to the configuration (which are rare) are understood by the business to be happening precisely between two other changes to the aggregate. 3M transactions per day averages 1 per 30ms; are you really scheduling your configuration changes that precisely?
The usual pattern here would be that the consistency rule is removed from the domain model; instead, you monitor for changes that introduce an inconsistency, and mitigate them. That depends upon there being a reasonable way to detect the errors, an efficient way to mitigate them, and a mechanic for keeping the rate under control.
The latter of these would normally be done by having the clients/the application check their local copy of the collection, and making sure the command sent is consistent with that before dispatching the command to the domain model. (Possible questions for your domain experts: how quickly do the configuration changes need to be applied? Do the configuration changes happen when the aggregate is changing frequently or when it is quiet?)
Another possibility might be to change your persistence strategy; if the collection doesn't change often, then there are not a lot of change events related to it. So maybe instead of persisting the aggregate, you look into persisting its history - in other words, using event-sourcing here. Maybe if this aggregate lived in a micro service, you could limit the risk of the change? Hard to say, at a million transactions per day, this aggregate sounds pretty important.
Should the "user/developer" who wants to do something with an aggregate only be faced with the aggregate root? So should every method I want to call on an entity deep inside that aggregate be "routed" through the root? That would make the root having a very broad interface with a lot of boring code.
Or ist it allowed to traverse and navigate through the aggregate, picking the entity you want to deal with and invoke the method directly on it?
Or have I to ask the root to give me the entity (not allowed to traverse and navigate through the aggregate from the outside) and then call the method on this entity directly?
the entity designated as the aggregate "root" is the gatekeeper, so all method calls need to go through him first. if you think about it this makes sense. if you hand out an internal entity, how can you be sure it is used in the intended way and the invariants are upheld? also, now your internal details are coupled and making internal structural changes will ripple throughout the system.
remember we strive to design small aggregate so if the surface area is getting too large that might be a sign your aggregate boundaries are wrong.
We all heard that injecting repository into aggregate is a bad idea, but almost no one tells why.
I will try to write here all disadvantages of doing this, so we can measure rightness of this statement.
First thing that comes into my head is Single Responsibility Principle.
It's true that by injecting repository into AR we are violating SRP, because retrieving and persisting of aggregate is not responsibility of aggregate itself. But it says only about "aggregate itself", not about other aggregates. So does it apply for retrieving from repository aggregates referenced by id? And what about storing them?
I used to think that aggregate shouldn't even know that there is some sort of persistence in system, because it doesn't have to exist. Aggregates can be created just for one procedure call and then get rid of.
Now when I think of it, it's not right, because aggregate root is an entity, and entity has sense only if it has some unique identity. So why would we need unique identity if not for persisting? Even if it's just a persistence in a memory. Maybe for comparing, but in my opinion it's not a main reason behind the identity.
Ok, let's assume that we retrieve and store OTHER aggregates from inside of our aggregate using injected repositories. What are other consequences beside SRP violation?
For sure there is a problem with having no control over persisting of aggregates and retrieving is some kind of lazy loading, which is bad for the same reason (no control).
Because of no control we can come into situation when we persist the same aggregate few times, where it could be persisted only once, or the same aggregate is loaded one hundred times where it could be loaded once, hence performance is worse. Also there might be problem with stale data.
These reasons practically disqualifies ability to inject repository into aggregate.
Here comes my main question - why can we inject repositories into domain service then?
Not the same reasons applies here? It's just like moving logic out of aggregate into separate function and pretend it to be something different.
To be honest, when I stared to write this SO question, I had no good answer for that. But after hours of investigating this problem and writing of this question I came to solution. Rubber duck debugging.
I'll post this question anyway for others having the same problems. Of course with my answer below.
Here are the places where I'd recommend to fetch aggregates (i.e. call Repository.Get...()), in preference order :
Application Service
Domain Service
Aggregate
We don't want Aggregates to fetch other Aggregates most of the time, because this blurs the lines, giving them orchestration powers which normally belong to the Application layer. You also raise the risk of the Aggregate trespassing its jurisdiction by modifying other Aggregates, which can result in contention and performance problems, not to mention that transactions become more difficult to analyze and the code base to reason about.
Domain Services are IMO a good place to fetch Aggregates when determining which aggregates to modify is domain logic per se. In your game example (which might not be the ideal context for DDD by the way), which units are affected by another unit's attack might be considered domain logic, thus you may not want to place it at the Application Service level. This rarely happens in my experience though.
Finally, Application Services are the default place where I call Repository.Get(...) for uniformity's sake and because this is the natural place to get a hold of the actors of the use case (usually only one Aggregate per transaction) and orchestrate calls to them.
That doesn't mean Aggregates should never be injected Repositories, there are exceptions, but other alternatives are almost always better.
So as I wrote in a question, I've found my answer already in the process of writing that question.
The best way to show this is by example:
When we have a simple (superficially) behavior like unit attacking other unit, we can write something like that.
unit.attack_unit(other_unit)
Problem is that, to attack an unit, we have to calculate damage and to do that we need another aggregates, like weapon and armor, which are referenced by id inside of unit. Since we cannot inject repository inside of aggregate, then we have to move that attack_unit logic into domain service, because we can inject repository there. Now where is the difference between injecting it into domain service, and not into unit aggregate.
Answer is - there is no difference. All consequences I described in question won't bite us. In both cases we will load both units once, attacking unit weapon once and armor of unit being attacked once. Also there won't be stale data, even if we mutate weapon object during process and store it, because that weapon is retrieved and stored in one place.
Problem shows up in different example.
Lets create an use case where unit can attack all other units in game in one process.
Problem lies in how we implement it. If we will use already defined unit.attack_unit and we will call it on all units in game (iterating over them), then weapon that is used to compute damage will be retrieved from unit aggregate, number of times equal to count of units in game! But it could be retrieved only once!
It doesn't matter if unit.attack_unit will be method of unit aggregate, or if it will be domain service unit_attack_unit. It will be still the same, weapon will be loaded too many times. To fix that we simply have to change implementation and with that probably interface too.
Now at least we have an answer to question "does moving logic from aggregate method to domain service (because we want to access repository there) fixes problem?". No, it does not change a thing.
Injecting repositories into domain service can be as dangerous as injecting it into aggregate if used wrong.
This answers my SO question, but we still don't have solution to real problem.
What can we do if we have two use cases: one where unit attacks one other unit, and second where unit attacks all other units, without duplicating domain logic.
One way is to put all needed aggregates as parameters to our aggregate method.
unit.attack_unit(unit, weapon, armor)
But what if we will need like five or more aggregates there? It's not a good way. Also application logic will have to know that all these aggregates are needed for an attack, which is knowledge leak. When attack_unit implementation will change we would also might to update interface of that method. What is the purpose of encapsulation then?
So, if we can't access repository to get needed aggregate, how can we smuggle it then?
We can get rid of idea with referencing aggregates by ids, or pass all needed aggregates from application layer (which means knowledge leak).
Or maybe reason of these problems is bad modelling?
Attacking of other unit is indeed an unit responsibility, but is damage calculation its responsibility? Of course not.
Maybe we need another object, like value object MeleeAttack(weapon, armor), yet when we add more properties that can change result of an attack, like enchantments on unit, it gets more complicated.
Also I think that we are now creating objects based on performance, not our on domain.
So from domain driven design, we get performance driven design. Is that what we want? I don't think so.
"So why would we need unique identity if not for persisting?" - think of an account scenario, where several John Smiths exist in your system. Imagine John Smith and John Smith Jr (who didn't enter the Jr in signup) both live at the same address. How do you tell them apart? Imagine I'm trying to write a recommendation engine based upon their past purchases . . . .
Identity is a quality of equality in DDD. If you don't have an identity unique from your fields, then you're a ValueObject.
What are consequences of using repository inside of aggregate vs inside of domain service?
There's a reasonably strong argument that you shouldn't do either.
Riddle: when does an aggregate need to see the state of another aggregate?
The responsibility of an aggregate is to control change. Any command that would change the state of the domain model is dispatched to the aggregate root responsible for the integrity of the state in question. By definition, all of the state required to ensure that the command is currently permitted is contained within the aggregate boundary.
So there is never any need to peek at the data outside of the aggregate when making a change to the model.
In which case, you don't ever need to load another aggregate, which makes the "where" question moot.
Two clarifications:
Queries will often combine the state of multiple aggregates, and will often need to follow a reference from one aggregate to another. The principle above is satisfied because queries treat the domain model as read-only. You need the state to answer the query, but you don't need the invariant enforcement because you aren't changing anything.
Another case is when you need state from another aggregate to process a command properly, but small latency in the data is an acceptable risk to the data. In that case, you query the "other" aggregate to get state. If you were to run that query within the domain model itself, the right way to do so would be via a domain service.
In most cases, though, you'll be equally well served to run the query when generating the command (ie, in the client), or when handling the command (in the application, outside the domain). It would be very unusual for a business to consider domain service latency to be acceptable but client latency to be unacceptable.
(Disconnected clients are one case where that can be especially problematic; when the command is generated and then queued for a long period of time before being dispatched to the server).
Ive looked quite a while at others posts in there relativly to aggregates roots. It seem that i don't understand at all how to define in the right way aggregates roots. I saw answers such as aggregates roots might not be aggregates roots and vice versa. I am a bit confused. The problem is that i have relational model in head but i know that DDD won't go that way.
Is there a way to define aggregate roots from a relation model ?
Example, if you have a journal that hold journal entries which each hold tasks, problems, and notes
How would you define the aggregates roots ? Is the root a journal ? but that may cause problems if you want to access notes, problems and tasks. So are thoses also aggregates roots which hold reference to journal entries ?
Its something hard to understand and i would like to have some more clarification.
Thanks.
I agree with you that the concept of aggregate roots can be confusing until you get your mind around it. Like most other concepts it gets easier with practice, working it through a few times.
The point of the aggregate is to simplify object traversal for some external obect, in the context of one or more use case(s). You have to start somewhere to satisfy a business requirement, and if you find that you are largely needing a Journal, it's like that it should in fact be an aggregate root. In most domains that aren't trivial, you will find it useful to have more than one aggregate root. There is nothing supernatural about the starting object for a use case. You just need to start somewhere.
But again, the point is to simplify object traversal, which simplifies your system. So if Journal is in fact a useful starting point, make all of your calls to Journal first. If a particular use case will wind up needing Tasks, Money, Time or any other useful things, the calling program gets those things by asking Journal, and only Journal. The other objects are part of the Journal aggregate root, for this use case.
For other use cases, it may be more natural and therefore useful for Task to be the starting point, and so you may have a Task aggregate root too, which will likely overlap your Journal aggregate root of use cases. But you ask Task and only Task to satisfy the request (be the only reference that the calling program needs to know about)
Your relational db can and will stay relational of course. But by having your object model evolve to look at requests from an aggregate (starting point object) point of view, your requests from the db will wind up being simpler.
Lay out a use case (or two) and try and work it through. Ask questions within the context of the use case if you like.
HTH,
Berryl