DDD aggregate performance - domain-driven-design

DDD aggregate performance - domain-driven-design

I'm new to DDD. Now I've an aggregate Team and entities TeamMember:
class Team {
members: Map<TeamMemberId, TeamMember>;
add(member) {
assert(!members.has(member.id), "Team Member is already exists");
this.members.set(member.id, member);
}
}
When I execute AddTeamMemberCommand, the repository will load the entire aggregate from MongoDB.
when the team is large, this may seem unacceptable.
I found the following from google and stackoverflow:
Use Id references instead of entities
Lazy load
Redesign aggregation
....
I'm not sure which solution is right for me, or is there a better, more generic solution for this scenario? Is there a GitHub sample project i can look at?
Thank you very much.

is there a better, more generic solution for this scenario?
For reads/queries, where you aren't going to be changing the aggregate at all, lazy loading is fine. In the world of CQRS, we might even avoid loading the aggregate altogether, instead just fetching a read-only copy of the information that we need.
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes.
If we're trying to make a change to an aggregate, and are tempted to leave a bunch of unnecessary information unloaded, that may imply that our aggregate boundaries are in the wrong places, and that we should be re-designing our domain model so that the loaded information better fits with what we need.
For example, if you are trying to update just Bob, not the entire team, then that may be a hint that Bob isn't an entity inside of the Team aggregate, but is instead belongs in a different, smaller aggregate, which has some relationship with team.
Mauro Servienti's talk on aggregate boundaries may be a good starting point.

Related

What is an Aggregate Root?

No, it is not a duplication question.
I have red many sources on the subject, but still I feel like I don't fully understand it.
This is the information I have so far (from multiple sources, be it articles, videos, etc...) about what is an Aggregate and Aggregate Root:
Aggregate is a collection of multiple Value Objects\Entity references and rules.
An Aggregate is always a command model (meant to change business state).
An Aggregate represents a single unit of (database - because essentialy the changes will be persisted) work, meaning it has to be consistent.
The Aggregate Root is the interface to the external world.
An Aggregate Root must have a globally unique identifier within the system
DDD suggests to have a Repository per Aggregate Root
A simple object from an aggregate can't be changed without its AR(Aggregate Root) knowing it
So with all that in mind, lets get to the part where I get confused:
in this site it says
The Aggregate Root is the interface to the external world. All interaction with an Aggregate is via the Aggregate Root. As such, an Aggregate Root MUST have a globally unique identifier within the system. Other Entites that are present in the Aggregate but are not Aggregate Roots require only a locally unique identifier, that is, an Id that is unique within the Aggregate.
But then, in this example I can see that an Aggregate Root is implemented by a static class called Transfer that acts as an Aggregate and a static function inside called TransferedRegistered that acts as an AR.
So the questions are:
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that its a function. what does have a globaly unique identifier is the Domain Event that this function produces.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function of the Aggregate class itself?
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier), then how can we interact with this Aggregate? the first article clearly stated that all interaction with an Aggregate is by the AR, if the AR is an event, then we can do nothing but react on it.
Is it right to say that the aggregate has two main jobs:
Apply the needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be raised in a Domain Event from the AR
Please correct me on any of the bullet points in the beginning if some/all of them are wrong is some way or another and feel free to add more of them if I have missed any!
Thanks for clarifying things out!

I feel like I don't fully understand it.
That's not your fault. The literature sucks.
As best I can tell, the core ideas of implementing solutions using domain driven design came out of the world of Java circa 2003. So the patterns described by Evans in chapters 5 and six of the blue book were understood to be object oriented (in the Java sense) domain modeling done right.
Chapter 6, which discusses the aggregate pattern, is specifically about life cycle management; how do you create new entities in the domain model, how does the application find the right entity to interact with, and so on.
And so we have Factories, that allow you to create instances of domain entities, and Repositories, that provide an abstraction for retrieving a reference to a domain entity.
But there's a third riddle, which is this: what happens when you have some rule in your domain that requires synchronization between two entities in the domain? If you allow applications to talk to the entities in an uncoordinated fashion, then you may end up with inconsistencies in the data.
So the aggregate pattern is an answer to that; we organize the coordinated entities into graphs. With respect to change (and storage), the graph of entities becomes a single unit that the application is allowed to interact with.
The notion of the aggregate root is that the interface between the application and the graph should be one of the members of the graph. So the application shares information with the root entity, and then the root entity shares that information with the other members of the aggregate.
The aggregate root, being the entry point into the aggregate, plays the role of a coarse grained lock, ensuring that all of the changes to the aggregate members happen together.
It's not entirely wrong to think of this as a form of encapsulation -- to the application, the aggregate looks like a single entity (the root), with the rest of the complexity of the aggregate being hidden from view.
Now, over the past 15 years, there's been some semantic drift; people trying to adapt the pattern in ways that it better fits their problems, or better fits their preferred designs. So you have to exercise some care in designing how to translate the labels that they are using.

In simple terms an aggregate root (AR) is an entity that has a life-cycle of its own. To me this is the most important point. One AR cannot contain another AR but can reference it by Id or some value object (VO) containing at least the Id of the referenced AR. I tend to prefer to have an AR contain only other VOs instead of entities (YMMV). To this end the AR is responsible for consistency and variants w.r.t. the AR. Each VO can have its own invariants such as an EMailAddress requiring a valid e-mail format. Even if one were to call contained classes entities I will call that semantics since one could get the same thing done with a VO. A repository is responsible for AR persistence.
The example implementation you linked to is not something I would do or recommend. I followed some of the comments and I too, as one commenter alluded to, would rather use a domain service to perform something like a Transfer between two accounts. The registration of the transfer is not something that may necessarily be permitted and, as such, the domain service would be required to ensure the validity of the transfer. In fact, the registration of a transfer request would probably be a Journal in an accounting sense as that is my experience. Once the journal is approved it may attempt the actual transfer.
At some point in my DDD journey I thought that there has to be something wrong since it shouldn't be so difficult to understand aggregates. There are many opinions and interpretations w.r.t. to DDD and aggregates which is why it can get confusing. The other aspect is, in IMHO, that there is a fair amount of design involved that requires some creativity and which is based on an understanding of the domain itself. Creativity cannot be taught and design falls into the realm of tacit knowledge. The popular example of tacit knowledge is learning to ride a bike. Now, we can read all we want about how to ride a bike and it may or may not help much. Once we are on the bike and we teach ourselves to balance then we can make progress. Then there are people who end up doing absolutely crazy things on a bike and even if I read how to I don't think that I'll try :)
Keep practicing and modelling until it starts to make sense or until you feel comfortable with the model. If I recall correctly Eric Evans mentions in the Blue Book that it may take a couple of designs to get the model closer to what we need.

Keep in mind that Mike Mogosanu is using a event sourcing approach but in any case (without ES) his approach is very good to avoid unwanted artifacts in mainstream OOP languages.
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that
its a function. what does have a globaly unique identifier is the
Domain Event that this function produces.
TransferNumber acts as natural unique ID; there is also a GUID to avoid the need a full Value Object in some cases.
There is no unique ID state in the computer memory because it is an argument but think about it; why you want a globaly unique ID? It is just to locate the root element and its (non unique ID) childrens for persistence purposes (find, modify or delete it).
Order A has 2 order lines (1 and 2) while Order B has 4 order lines (1,2,3,4); the unique identifier of order lines is a composition of its ID and the Order ID: A1, B3, etc. It is just like relational schemas in relational databases.
So you need that ID just for persistence and the element that goes to persistence is a domain event expressing the changes; all the changes needed to keep consistency, so if you persist the domain event using the global unique ID to find in persistence what you have to modify the system will be in a consistent state.
You could do
var newTransfer = New Transfer(TransferNumber); //newTransfer is now an AG with a global unique ID
var changes = t.RegisterTransfer(Debit debit, Credit credit)
persistence.applyChanges(changes);
but what is the point of instantiate a object to create state in the computer memory if you are not going to do more than one thing with this object? It is pointless and most of OOP detractors use this kind of bad OOP design to criticize OOP and lean to functional programming.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function
of the Aggregate class itself?
It is the function itself. You can read in the post:
AR is a role , and the function is the implementation.
An Aggregate represents a single unit of work, meaning it has to be consistent. You can see how the function honors this. It is a single unit of work that keeps the system in a consistent state.
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier),
then how can we interact with this Aggregate? the first article
clearly stated that all interaction with an Aggregate is by the AR, if
the AR is an event, then we can do nothing but react on it.
Answered above because the domain event is not the AR.
4 Is it right to say that the aggregate has two main jobs: Apply the
needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be
raised in a Domain Event from the AR
Yes; again, you can see how the static function honors this.
You could try to contat Mike Mogosanu. I am sure he could explain his approach better than me.

Do we need another repo for each entity?

For example take an order entity. It's obvious that order lines don't exist without order. So we have to get them with the help of OrderRepository(throw an order entity). Ok. But what about other things that are loosely coupled with order? Should the customer info be available only from CustomerRepo and bank requisites of the seller available from BankRequisitesRepo, etc.? If it is correct, we should pass all these repositories to our Create Factory method I think.

Yes. In general, each major entity (aggregate root in domain driven design terminology) should have their own repositories. Child entities *like order lines) will generally not need one.
And yes. Define each repository as a service then inject them where needed.
You can even design things such that there is no direct coupling between Order and Customer in terms of an actual database link. This in turn allows customers and orders to live in completely independent databases. Which may or may not be useful for your applications.

You correctly understood that aggregate roots's (AR) child entities shall not have their own repository, unless they are themselves AR's. Having repositories for non-ARs would leave your invariants unprotected.
However you must also understand that entities should usually not be clustered together for convenience or just because the business states that some entity has one or many some other entity.
I strongly recommend that you read Effective Aggregate Design by Vaughn Vernon and this other blog post that Vaughn kindly wrote for a question I asked.
One of the aggregate design rule of thumb stated in Effective Aggregate Design is that you should usually reference other aggregates by identity only.
Therefore, you greatly reduce the number of AR instances needed in other AR's creationnal process since new Order(customer, ...) could become new Order(customerId, ...).
If you still find the need to query other AR's in one AR's creationnal process, then there's nothing wrong in injecting repositories as dependencies, but you should not depend on more than you need (e.g. let the client resolve the real dependencies and pass them directly rather than passing in a strategy allowing to resolve a dependency).

Should lookup values be modeled as aggregate roots?

As part of my domain model, lets say I have a WorkItem object. The WorkItem object has several relationships to lookup values such as:
WorkItemType:
UserStory
Bug
Enhancement
Priority:
High
Medium
Low
And there could possibly be more, such as Status, Severity, etc...
DDD states that if something exists within an aggregate root that you shouldn't attempt to access it outside of the aggregate root. So if I want to be able to add new WorkItemTypes like Task, or new Priorities like Critical, do those lookup values need to be aggregate roots with their own repositories? This seems a little overkill especially if they are only a key value pair. How should I allow a user to modify these values and still comply with the aggregate root encapsulation rule?

While the repository pattern as described in the blue book does emphasize its use being exclusive to aggregates, it does leave room open for exceptions. To quote the book:
Although most queries return an object or a collection of objects, it
also fits within the concept to return some types of summary
calculations, such as an object count, or a sum of a numerical
attribute that was intended by the model to be tallied.
(pg. 152)
This states that a repository can be used to return summary information, which is not an aggregate. This idea extends to using a repository to look up value objects, just as your use case requires.
Another thing to consider is the read-model pattern which essentially allows for a query-only type of repository which effectively decouples the behavior-rich domain model from query concerns.

Landon, I think that the only way is to make those value pairs aggregate roots. I know that is might look overkill, but that's DDD braking things into small components.
The reasons why I think using a repository is the right way are:
A user needs to be able to add those value pairs independently of a Work Item.
The value pairs don't have a local, unique identity
Remember that DDD is just a set of guidelines, not hard truths. If you think that this is overkill, you might want to create a lookup that returns the pairs as value objects. This might work out specially if you don't have a feature to add value pairs in the application, but rather through the database.
As a side note, good question! There are quite a few blog posts about this situations... But not all agree on the best way to do this.

Not everything should be modeled using DDD. The complexity of managing the reference data most likely wouldn't justify creating aggregate roots. A common solution would be to use CRUD to manage reference data, and have a Domain Service to interface with that data from the domain.

Do these lookups have ID's ? If not, you could consider making them Value Objects...

Aggregate roots depend on the use case so does that mean that we might end up with really a lots of repositories?

Ive heard a lots that aggregate roots depend on the use case. But what does that mean in coding context ?
You have a service class which offcourse hold methods (use cases) that gonna accomplish something in a repository. Great, so you use a repository which is equal to an aggregate root to perform your querying.
Now you need to perform some other kind of operation which use totally different use case than the first service class but use the same entities.
Here the representation :
Entities: Customer, Orders, LineOrder
Service 1: Add new customers, Delete some customers, retrieve customer orders
Here the aggregate root seem to be Customer because you need this repository to perform thoses use cases.
Service 2: Retrieve customer from an actual order
Here the aggregate root seem to be Order because you need this repository to perform this use case.
If i am wrong please correct me. Now that mean you have 2 aggregates roots.
Now my question is, since aggregate roots depend on the use case does that mean that we might end up with really a lots of repositories if you end up having lots of use cases ?
The above example was probably not the best example... so lets say we have a Journal which hold JournalEntries which each entries hold Tasks, Problems and Notes. (This is in the context of telling to a system what have been done to a project)
Does that mean that im gonna end up with 2 repository ? (Journal, JournalEntry)
In the use cases where i need to add new tasks, problems and notes from an journal entry ?
(Can be seen as a service)
Or might end up with 4 repository. (Journal, Task, Problems, Notes)
In the use cases where i need to access directment task, problems and notes ?
(Can be seen as another service)
But that would mean if i need both of theses services (that actually hold the use cases) that i actually need 5 repository to be able to perform use cases in both of them ?
Thanks.

Hi I saw your post and thought I may give you my opion. First I must say I've been doing DDD in project for three years now, so I'm not an expert. But I'm currently working in a project as an architect an coaching developers in DDD, and I must say it isn't a walk in the park... I don't know how many times I've refactored the model and Entity relationships.
But my experience is that you endup with some repositories (more than few but not many). My Aggregates usually contains a few classes and the Aggregate object graph isn't that deep (if you know what I mean).
But I try to be concrete:
1) Aggregate roots are defined by your needs. I mean if you feel that you need that Tasks object through Journal to often, then maybe thats a sign for it to be upgraded as a aggregate root.
2) But everything cannot be aggregate roots, so try to capsulate object that are tight related. Notes seems like a candidate for being own by a root object. You'd probably always relate Notes to the root or it loses its context. Notes cannot live by itself.
3) Remember that Aggregates are used for splitting up large complex domains into smaller "islands" that take care of thier inhabbitants. Its important to not make your domain more complex than it is.
4) You don't know how your model look likes before you've reached far into the project implementation phase. If you realize that some repositories aren't used that much, they may be candidates for merging into other root object (if they have that kind of relationship). You can break out objects that are used so much through root object without its context. I mean for example if Journal are aggregate root and contains Notes and Tasks. After a while you model grows and maybe Tasks
have assoications to Action and ActionHistory and User and Rule and Permission. Now I just throw out a bunch om common objects in a rule/action/user permission functionality. Maybe this result in usecases that approach Tasks from another angle, "View all Tasks performed by this User" etc. Tasks get more involved in some kind of State/Workflow engine and therefor candidates for being an aggregate root itself.
Okey. Not the best example but it maybe gives you the idea. A root object can contain children where some of its children can also be root object because we need it in another context (than journal).
But I have myself banged my head against the wall everytime you startup with a fresh model. Just go with the flow and let the model evolve itself through its clients/subsribers. You refine the model through its usage. The Services (application services and not domain services) are of course extended with methods that respond to UI and usecases (often one-to-one).
I hope I helped you in someway...or not :D

Yes, you would most likely end up with 5 repositories (Journal, JournalEntry, Task, Problems, Notes). Your services would then use these repositories to perform CRUD for each type of entity.
Your reaction of "wow so many repositories" is not uncommon for developers new to DDD.
However, your repositories are usually light weight assuming your model and DB schema are fairly evenly matched which is often the case. If you use an ORM such as nHibernate or a tool such as codesmith generator then it gets even easier to create your repositories.

At first you need to define what is aggregate. I don't know about use case aggregates.
I know about aggregates following...
Aggregates are union of several entities. One of the entities is the aggregate root, the rest entities (or value types) have sense only in selected aggregate root context. For example you can define Order and OrderLine as an aggregate if you don't need to do any independent actions with OrderLine entities. It means that OrderLine makes sense in Order context only.
Why to define aggregates at all? It is required to reduce references between objects. That will simplify you domain model.
And of course you don't need to have OrderLineRepository if OrderLine is a part of Order aggregate.
Here is a link with more information. You can read Eric Evans DDD book. He explains aggregates very well.

data access in DDD?

After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?

CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.

DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.

Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);

I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)

Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.

The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string