Related
I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.
No, it is not a duplication question.
I have red many sources on the subject, but still I feel like I don't fully understand it.
This is the information I have so far (from multiple sources, be it articles, videos, etc...) about what is an Aggregate and Aggregate Root:
Aggregate is a collection of multiple Value Objects\Entity references and rules.
An Aggregate is always a command model (meant to change business state).
An Aggregate represents a single unit of (database - because essentialy the changes will be persisted) work, meaning it has to be consistent.
The Aggregate Root is the interface to the external world.
An Aggregate Root must have a globally unique identifier within the system
DDD suggests to have a Repository per Aggregate Root
A simple object from an aggregate can't be changed without its AR(Aggregate Root) knowing it
So with all that in mind, lets get to the part where I get confused:
in this site it says
The Aggregate Root is the interface to the external world. All interaction with an Aggregate is via the Aggregate Root. As such, an Aggregate Root MUST have a globally unique identifier within the system. Other Entites that are present in the Aggregate but are not Aggregate Roots require only a locally unique identifier, that is, an Id that is unique within the Aggregate.
But then, in this example I can see that an Aggregate Root is implemented by a static class called Transfer that acts as an Aggregate and a static function inside called TransferedRegistered that acts as an AR.
So the questions are:
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that its a function. what does have a globaly unique identifier is the Domain Event that this function produces.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function of the Aggregate class itself?
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier), then how can we interact with this Aggregate? the first article clearly stated that all interaction with an Aggregate is by the AR, if the AR is an event, then we can do nothing but react on it.
Is it right to say that the aggregate has two main jobs:
Apply the needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be raised in a Domain Event from the AR
Please correct me on any of the bullet points in the beginning if some/all of them are wrong is some way or another and feel free to add more of them if I have missed any!
Thanks for clarifying things out!
I feel like I don't fully understand it.
That's not your fault. The literature sucks.
As best I can tell, the core ideas of implementing solutions using domain driven design came out of the world of Java circa 2003. So the patterns described by Evans in chapters 5 and six of the blue book were understood to be object oriented (in the Java sense) domain modeling done right.
Chapter 6, which discusses the aggregate pattern, is specifically about life cycle management; how do you create new entities in the domain model, how does the application find the right entity to interact with, and so on.
And so we have Factories, that allow you to create instances of domain entities, and Repositories, that provide an abstraction for retrieving a reference to a domain entity.
But there's a third riddle, which is this: what happens when you have some rule in your domain that requires synchronization between two entities in the domain? If you allow applications to talk to the entities in an uncoordinated fashion, then you may end up with inconsistencies in the data.
So the aggregate pattern is an answer to that; we organize the coordinated entities into graphs. With respect to change (and storage), the graph of entities becomes a single unit that the application is allowed to interact with.
The notion of the aggregate root is that the interface between the application and the graph should be one of the members of the graph. So the application shares information with the root entity, and then the root entity shares that information with the other members of the aggregate.
The aggregate root, being the entry point into the aggregate, plays the role of a coarse grained lock, ensuring that all of the changes to the aggregate members happen together.
It's not entirely wrong to think of this as a form of encapsulation -- to the application, the aggregate looks like a single entity (the root), with the rest of the complexity of the aggregate being hidden from view.
Now, over the past 15 years, there's been some semantic drift; people trying to adapt the pattern in ways that it better fits their problems, or better fits their preferred designs. So you have to exercise some care in designing how to translate the labels that they are using.
In simple terms an aggregate root (AR) is an entity that has a life-cycle of its own. To me this is the most important point. One AR cannot contain another AR but can reference it by Id or some value object (VO) containing at least the Id of the referenced AR. I tend to prefer to have an AR contain only other VOs instead of entities (YMMV). To this end the AR is responsible for consistency and variants w.r.t. the AR. Each VO can have its own invariants such as an EMailAddress requiring a valid e-mail format. Even if one were to call contained classes entities I will call that semantics since one could get the same thing done with a VO. A repository is responsible for AR persistence.
The example implementation you linked to is not something I would do or recommend. I followed some of the comments and I too, as one commenter alluded to, would rather use a domain service to perform something like a Transfer between two accounts. The registration of the transfer is not something that may necessarily be permitted and, as such, the domain service would be required to ensure the validity of the transfer. In fact, the registration of a transfer request would probably be a Journal in an accounting sense as that is my experience. Once the journal is approved it may attempt the actual transfer.
At some point in my DDD journey I thought that there has to be something wrong since it shouldn't be so difficult to understand aggregates. There are many opinions and interpretations w.r.t. to DDD and aggregates which is why it can get confusing. The other aspect is, in IMHO, that there is a fair amount of design involved that requires some creativity and which is based on an understanding of the domain itself. Creativity cannot be taught and design falls into the realm of tacit knowledge. The popular example of tacit knowledge is learning to ride a bike. Now, we can read all we want about how to ride a bike and it may or may not help much. Once we are on the bike and we teach ourselves to balance then we can make progress. Then there are people who end up doing absolutely crazy things on a bike and even if I read how to I don't think that I'll try :)
Keep practicing and modelling until it starts to make sense or until you feel comfortable with the model. If I recall correctly Eric Evans mentions in the Blue Book that it may take a couple of designs to get the model closer to what we need.
Keep in mind that Mike Mogosanu is using a event sourcing approach but in any case (without ES) his approach is very good to avoid unwanted artifacts in mainstream OOP languages.
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that
its a function. what does have a globaly unique identifier is the
Domain Event that this function produces.
TransferNumber acts as natural unique ID; there is also a GUID to avoid the need a full Value Object in some cases.
There is no unique ID state in the computer memory because it is an argument but think about it; why you want a globaly unique ID? It is just to locate the root element and its (non unique ID) childrens for persistence purposes (find, modify or delete it).
Order A has 2 order lines (1 and 2) while Order B has 4 order lines (1,2,3,4); the unique identifier of order lines is a composition of its ID and the Order ID: A1, B3, etc. It is just like relational schemas in relational databases.
So you need that ID just for persistence and the element that goes to persistence is a domain event expressing the changes; all the changes needed to keep consistency, so if you persist the domain event using the global unique ID to find in persistence what you have to modify the system will be in a consistent state.
You could do
var newTransfer = New Transfer(TransferNumber); //newTransfer is now an AG with a global unique ID
var changes = t.RegisterTransfer(Debit debit, Credit credit)
persistence.applyChanges(changes);
but what is the point of instantiate a object to create state in the computer memory if you are not going to do more than one thing with this object? It is pointless and most of OOP detractors use this kind of bad OOP design to criticize OOP and lean to functional programming.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function
of the Aggregate class itself?
It is the function itself. You can read in the post:
AR is a role , and the function is the implementation.
An Aggregate represents a single unit of work, meaning it has to be consistent. You can see how the function honors this. It is a single unit of work that keeps the system in a consistent state.
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier),
then how can we interact with this Aggregate? the first article
clearly stated that all interaction with an Aggregate is by the AR, if
the AR is an event, then we can do nothing but react on it.
Answered above because the domain event is not the AR.
4 Is it right to say that the aggregate has two main jobs: Apply the
needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be
raised in a Domain Event from the AR
Yes; again, you can see how the static function honors this.
You could try to contat Mike Mogosanu. I am sure he could explain his approach better than me.
In DDD, a repository loads an entire aggregate - we either load all of it or none of it. This also means that should avoid lazy loading.
My concern is performance-wise. What if this results in loading into memory thousands of objects? For example, an aggregate for Customer comes back with ten thousand Orders.
In this sort of cases, could it mean that I need to redesign and re-think my aggregates? Does DDD offer suggestions regarding this issue?
Take a look at this Effective Aggregate Design series of three articles from Vernon. I found them quite useful to understand when and how you can design smaller aggregates rather than a large-cluster aggregate.
EDIT
I would like to give a couple of examples to improve my previous answer, feel free to share your thoughts about them.
First, a quick definition about an Aggregate (took from Patterns, Principles and Practices of Domain Driven Design book by Scott Millet)
Entities and Value Objects collaborate to form complex relationships that meet invariants within the domain model. When dealing with large interconnected associations of objects, it is often difficult to ensure consistency and concurrency when performing actions against domain objects. Domain-Driven Design has the Aggregate pattern to ensure consistency and to define transactional concurrency boundaries for object graphs. Large models are split by invariants and grouped into aggregates of entities and value objects that are treated as conceptual whole.
Let's go with an example to see the definition in practice.
Simple Example
The first example shows how defining an Aggregate Root helps to ensure consistency when performing actions against domain objects.
Given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If we say that Bid is also a separated Aggregate Root you would have have a BidsRepository, and you could easily do:
var newBid = new Bid(money);
BidsRepository->save(auctionId, newBid);
And you were saving a Bid without passing the defined business rule. However, having the Auction as the only Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
auction.placeBid(newBid);
auctionRepository.save(auction);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid.
Here it is pretty clear that the state of a Bid depends on the state of an Auction.
Complex Example
Back to your example of Orders being associated to a Customer, looks like there are not invariants that make us define a huge aggregate consisting of a Customer and all her Orders, we can just keep the relation between both entities thru an identifier reference. By doing this, we avoid loading all the Orders when fetching a Customer as well as we mitigate concurrency problems.
But, say that now business defines the next invariant:
We want to provide Customers with a pocket so they can charge it with money to buy products. Therefore, if a Customer now wants to buy a product, it needs to have enough money to do it.
Said so, pocket is a VO inside the Customer Aggregate Root. It seems now that having two separated Aggregate Roots, one for Customer and another one for Order is not the best to satisfy the new invariant because we could save a new order without checking the rule. Looks like we are forced to consider Customer as the root. That is going to affect our performance, scalaibility and concurrency issues, etc.
Solution? Eventual Consistency. What if we allow the customer to buy the product? that is, having an Aggregate Root for Orders so we create the order and save it:
var newOrder = new Order(customerId, ...);
orderRepository.save(newOrder);
we publish an event when the order is created and then we check asynchronously if the customer has enough funds:
class OrderWasCreatedListener:
var customer = customerRepository.findOfId(event.customerId);
var order = orderRepository.findOfId(event.orderId);
customer.placeOrder(order); //Check business rules
customerRepository.save(customer);
If everything was good, we have satisfied our invariants while keeping our design as we wanted at the beginning modifying just one Aggregate Root per request. Otherwise, we will send an email to the customer telling her about the insufficient funds issue. We can take advance of it by adding to the email alternatives options she can purchase with her current budget as well as encourage her to charge the pocket.
Take into account that the UI can help us to avoid having customers paying without enough money, but we cannot blindly trust on the UI.
Hope you find both examples useful, and let me know if you find better solutions for the exposed scenarios :-)
In this sort of cases, could it mean that I need to redesign and re-think my aggregates?
Almost certainly.
The driver for aggregate design isn't structure, but behavior. We don't care that "a user has thousands of orders". What we care about are what pieces of state need to be checked when you try to process a change - what data do you need to load to know if a change is valid.
Typically, you'll come to realize that changing an order doesn't (or shouldn't) depend on the state of other orders in the system, which is a good indication that two different orders should not be part of the same aggregate.
This is a practical Domain Driven Design question:
Conceptually, I think I get Aggregate roots until I go to define one.
I have an Employee entity, which has surfaced as an Aggregate root. In the Business, some employees can have work-related Violations logged against them:
Employee-----*Violations
Since not all Employees are subject to this, I would think that Violations would not be a part of the Employee Aggregate, correct?
So when I want to work with Employees and their related violations, is this two separate Repository interactions by some Service?
Lastly, when I add a Violation, is that method on the Employee Entity?
Thanks for the help!
After doing even MORE research, I think I have the answer to my question.
Paul Stovell had this slightly edited response to a similar question on the DDD messageboard. Substitute "Customer" for "Employee", and "Order" for "Violation" and you get the idea.
Just because Customer references Order
doesn't necessarily mean Order falls
within the Customer aggregate root.
The customer's addresses might, but
the orders can be independent (for
example, you might have a service that
processes all new orders no matter who
the customer is. Having to go
Customer->Orders makes no sense in
this scenario).
From a domain point of view, you can
even question the validity of those
references (Customer has reference to
a list of Orders). How often will you
actually need all orders for a
customer? In some systems it makes
sense, but in others, one customer
might make many orders. Chances are
you want orders for a customer between
a date range, or orders for a customer
that aren't processed yet, or orders
which have not been paid, and so on.
The scenario in which you'll need all
of them might be relatively uncommon.
However, it's much more likely that
when dealing with an Order, you will
want the customer information. So in
code, Order.Customer.Name is useful,
but Customer.Orders[0].LineItem.SKU -
probably not so useful. Of course,
that totally depends on your business
domain.
In other words, Updating Customer has nothing to do with updating Orders. And orders, or violations in my case, could conceivable be dealt with independently of Customers/Employees.
If Violations had detail lines, then Violation and Violation line would then be a part of the same aggregate because changing a violation line would likely affect a Violation.
EDIT**
The wrinkle here in my Domain is that Violations have no behavior. They are basically records of an event that happened. Not sure yet about the implications that has.
Eric Evan states in his book, Domain-Driven Design: Tackling the Complexity in the Heart of Software,
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes.
There are 2 important points here:
These objects should be treated as a "unit".
For the purpose of "data change".
I believe in your scenario, Employee and Violation are not necessarily a unit together, whereas in the example of Order and OrderItem, they are part of a single unit.
Another thing that is important when modeling the agggregate boundaries is whether you have any invariants in your aggregate. Invariants are business rules that should be valid within the "whole" aggregate. For example, as for the Order and OrderItem example, you might have an invariant that states the total cost of the order should be less than a predefined amount. In this case, anytime you want to add an OrderItem to the Order, this invariant should be enforced to make sure that your Order is valid. However, in your problem, I don't see any invariants between your entities: Employee and Violation.
So short answer:
I believe Employee and Violation each belong to 2 separate aggregates. Each of these entities are also their own aggregate roots. So you need 2 repositories: EmployeeRepository and ViolationRepository.
I also believe you should have an unidirectional association from Violation to Employee. This way, each Violation object knows who it belongs to. But if you want to get the list of all Violations for a particular Employee, then you can ask the ViolationRepository:
var list = repository.FindAllViolationsByEmployee(someEmployee);
You say that you have employee entity and violations and each violation does not have any behavior itself. From what I can read above, it seems to me that you may have two aggregate roots:
Employee
EmployeeViolations (call it EmployeeViolationCard or EmployeeViolationRecords)
EmployeeViolations is identified by the same employee ID and it holds a collection of violation objects. You get behavior for employee and violations separated this way and you don't get Violation entity without behavior.
Whether violation is entity or value object you should decide based on its properties.
I generally agree with Mosh on this one. However, keep in mind the notion of transactions in the business point of view. So I actually take "for the purpose of data changes" to mean "for the purpose of transaction(s)".
Repositories are views of the domain model. In a domain environment, these "views" really support or represent a business function or capability - a transaction. Case in point, the Employee may have one or more violations, and if so, are aspects of a transaction(s) in a point in time. Consider your use cases.
Scenario: "An employee commits an act that is a violation of the workplace." This is a type of business event (i.e. transaction, or part of a larger, perhaps distributed transaction) that occurred. The root affected domain object actually can be seen from more than one perspective, which is why it is confusing. But the thing to remember is behavior as it pertains to a business transaction, since you want your business processes to model the real-world as accurate as possible. In terms of relationships, just like in a relational database, your conceptual domain model should actually indicate this already (i.e. the associativity), which often can be read in either direction:
Employee <----commits a -------committed by ----> Violation
So for this use case, it would be fair that to say that it is a transaction dealing with violations, and that the root - or "primary" entity - is a Violation. That, then would be your aggregate root you would reference for that particular business activity or business process. But that is not to say that, for a different activity or process, that you cannot have an Employee aggregate root, such as the "new employee process". If you take care, there should be no negative impact of cyclic references, or being able to traverse your domain model multiple ways. I will warn, however, that governing of this should be thought about and handled by your controller piece of your business domain, or whatever equivalent you have.
Aside: Thinking in terms of patterns (i.e. MVC), the repository is a view, the domain objects are the model, and thus one should also employ some form of controller pattern. Typically, the controller declares the concrete implementation of and access to the repositories (collections of aggregate roots).
In the data access world...
Using LINQ-To-SQL as an example, the DataContext would be the controller exposing a view of Customer and Order entities. The view is a non-declarative, framework-oriented Table type (rough equivalent to Repository). Note that the view keeps a reference to its parent controller, and often goes through the controller to control how/when the view gets materialized. Thus, the controller is your provider, taking care of mapping, translation, object hydration, etc. The model is then your data POCOs. Pretty much a typical MVC pattern.
Using N/Hibernate as an example, the ISession would be the controller exposing a view of Customer and Order entities by way of the session.Enumerable(string query) or session.Get(object id) or session.CreateCriteria(typeof(Customer)).List()
In the business logic world...
Customer { /*...*/ }
Employee { /*...*/ }
Repository<T> : IRepository<T>
, IEnumerable<T>
//, IQueryable<T>, IQueryProvider //optional
{ /**/ }
BusinessController {
Repository<Customer> Customers { get{ /*...*/ }} //aggregate root
Repository<Order> Orders { get{ /*...*/ }} // aggregate root
}
In a nutshell, let your business processes and transactions be the guide, and let your business infrastructure naturally evolve as processes/activities are implemented or refactored. Moreover, prefer composability over traditional black box design. When you get to service-oriented or cloud computing, you will be glad you did. :)
I was wondering what the conclusion would be?
'Violations' become a root entity. And 'violations' would be referenced by 'employee' root entity. ie violations repository <-> employee repository
But you are consfused about making violations a root entity becuase it has no behavior.
But is 'behaviour' a criteria to qualify as a root entity? I dont think so.
a slightly orthogonal question to test understanding here, going back to Order...OrderItem example, there might be an analytics module in the system that wants to look into OrderItems directly i.e get all orderItems for a particular product, or all order items greater than some given value etc, does having a lot of usecases like that and driving "aggregate root" to extreme could we argue that OrderItem is a different aggregate root in itself ??
It depends. Does any change/add/delete of a vioation change any part of employee - e.g. are you storing violation count, or violation count within past 3 years against employee?
I'm still wrapping my head around DDD, and one of the stumbling blocks I've encountered is in how to handle associations between separate aggregates. Say I've got one aggregate encapsulating Customers and another encapsulating Shipments.
For business reasons Shipments are their own aggregates, and yet they need to be explicitly tied to Customers. Should my Customer domain entity have a list of Shipments? If so, how do I populate this list at the repository level - given I'll have a CustomerRepository and a ShipmentRepository (one repo per aggregate)?
I'm saying 'association' rather than 'relationship' because I want to stress that this is a domain decision, not an infrastructure one - I'm designing the system from the model first.
Edit: I know I don't need to model tables directly to objects - that's the reason I'm designing the model first. At this point I don't care about the database at all - just the associations between these two aggregates.
There's no reason your ShipmentRepository can't aggregate customer data into your shipment models. Repositories do not have to have a 1-to-1 mapping with tables.
I have several repositories which combine multiple tables into a single domain model.
I think there's two levels of answering this question. At one level, the question is how do I populate the relationship between customer and shipment. I really like the "fill" semantics where your shipment repository can have a fillOrders( List customers, ....).
The other level is "how do I handle the denormalized domain models that are a part of DDD". And "Customer" is probably the best example of them all, because it simply shows up in such a lot of different contexts; almost all your processes have customer in them and the context of the customer is usually extremely varied. At max half the time you are interested in the "orders". If my understanding of the domain was perfect when starting, I'd never make a customer domain concept. But it's not, so I always end up making the Customer object. I still remember the project where I after 3 years felt that I was able to make the proper "Customer" domain model. I would be looking for the alternate and more detailed concepts that also represent the customer; PotentialCustomer, OrderingCustomer, CustomerWithOrders and probably a few others; sorry the names aren't better. I'll need some more time for that ;)
Shipment has relation many-to-one relationship with Customer.
If your are looking for the shipments of a client, add a query to your shipment repository that takes a client parameter.
In general, I don't create one-to-mane associations between entities when the many side is not limited.