Using Lazy Loading when related entities reside within the same aggregate

Using Lazy Loading when related entities reside within the same aggregate - domain-driven-design

From DDD: Tackling Complexity in the Heart of Software ( pg. 177 ):
The need to update Delivery History when adding a Handling Event gets
the Cargo AGGREGATE involved in the transaction.
a) Further down the page author does propose an alternative solution, but still - isn't author in above excerpt essentially proposing that we'd implement an association by having DeliveryHistory.Events property query a database ( via repository ) each time this property gets accessed?
b) Since implementation "proposed" by author is almost identical to how lazy loading is implemented ( with an exception that lazy loading only queries for data the first time we need it and then caches it ), I'll also ask the following:
Many are against lazy loading in general, but regardless, I assume that we should never use lazy loading if related entities reside within the same aggregate, since such an association is expressed with object reference, which is implemented when we require a transactional integrity?
Reason being that this integrity may be compromised if related data is never accessed ( and as such is never retrieved ), since invariants can't be enforced properly when aggregate is modified?
UPDATE:
a)
The DeliveryHistory.Events collection can be loaded when the
DeliveryHistory entity is loaded by the repository. It can also be
loaded via lazy loading in which case an ORM injects a collection
proxy which when iterated calls the database.
But isn't author proposing a third option, which is to query for events each time DeliveryHistory.Events is accessed ( or perhaps each time DeliveryHistory.GetEvents() is called )?
b)
It is similar to lazy loading however the important difference is that
resorting to a repository query allows the omission of the Events
property in the object model. This reduces the "footprint" of the
DeliveryHistory entity.
I - I'm assuming that by "being similar to lazy loading" you're referring to a design where events are retrieved from the db each time they are requested?!
II - Anyways, if we omit the DeliveryHistory.Events property ( and presumably don't define as an alternative a DeliveryHistory.GetEvents()), how then do we implement a design proposed by author ( as noted in my original post, I'm aware that further down the page author did propose a better alternative )?
Thank you

a) The DeliveryHistory.Events collection can be loaded when the DeliveryHistory entity is loaded by the repository. It can also be loaded via lazy loading in which case an ORM injects a collection proxy which when iterated calls the database.
b) It is similar to lazy loading however the important difference is that resorting to a repository query allows the omission of the Events property in the object model. This reduces the "footprint" of the DeliveryHistory entity.
The problem with lazy loading is not that data may never be accessed, it is that accessing a lazy loaded property for the first time will result in a database call and you have to make sure that the connection is still alive. In a sense, this can compromise the integrity of the aggregate which should be considered a whole.
UPDATE
a) Either way the net result is the same. I'm not sure if creating a proxy collection was a technique utilized when the book was written (2003).
b1) Yes, they are similar in that the events aren't loaded together with the DeliveryHistory entity, but only on demand.
b2) Instead of an events property on the DeliveryHistory entity, the events would be accessed by calling a repository. The repository itself would be called by the surrounding application service. It would retrieve the events and pass them to places that needed them. Or if the use case is adding events, the application service would call the repository to persist the event.

Related

DDD: Loading whole aggregate would result in performance problems

Most DDD Books (e.g. Patterns and Principles of DDD) recommend strongly to load the whole aggregate when getting the data from the database. The reason for that is that aggregates are consistent boundaries.
But there common cases where this would result in overwhelming performance problems.
Here is a real example I am facing:
I have an aggregate root which is an workobject entity with its properties. There are other entities in this aggregate:
List of attached documents of the workobject. Each document is an entity.(The class document contains the metadata of the real document).
List of comments. Each comment is an entity`.
List of activities. Each activity is an entity which represents an activity that is done on this workobject.
List of ArchivedFiles. Each ArchivedFile is an entity which represents an document which is already archived in an external system. (The class ArchivedFile contains the metadata of the real archived file)
These entities belong to the aggregate, because changes on the workobject would mostly affect the state of these entities, too.
Now I have the following problem:
In the UI, there is a place where a user gets all the workobjects that are in his/her inbox. This could be more than 100 workobjects or even more. But it does not make sense to load the whole aggregate (comments,activies,documents) for each workobject at that point. This would slow down the application resulting in a terrible user experience.
The idea is to show just the properties of the workobject to the user in a datagrid. If a user makes a specific event like clicking on a specific workobject, a specific form is loaded where detailed information of the specific workobject is loaded. That would be an appropriate point to load the whole aggregate (i.e., comments,activies,documents). But most of the DDD-Books (e.g. Patterns and Principles of DDD) warn to not use lazy loading inside an aggregate, but to load the whole aggregate when loading the aggregate root.
How should we solve this problem by still respecting the DDD-rules?

How should we solve this problem by still respecting the DDD-rules?
Usual answer: don't use the aggregate pattern when the thing you want is a report.
Lazy loading is a "code smell" when performing domain dynamics; if you are leaving a lot of data behind when making a change, that strongly suggests that the information left behind belongs in a different aggregate.
But for an operation that is effectively read only, like a report? We're not going to be changing anything, so we don't need the constraint that ensure that our changes our correct, so we don't need the information we use to describe the constraint.
For more ideas about separating reads from writes, review the patterns described under the umbrella cqrs ("command query responsibility segregation").

How to implement references beetwen aggregates in ddd?

In ddd an entity can reference entities of the same aggregate, or another aggregate root (but not entities inside another aggregate).
How would such a reference be implemented?
How would a method of the entity get access to the referenced aggregate root?
What is the method of the entity allowed to do with the other aggregate root?
For 1. and 2. my problem is, that an entity should not have access to repositories. Also magic lazy load mechanisms are not always available and I think should be avoided for the same reasons. So when the aggregate is loaded by the repository, should all references of every entity in it be resolved (and all referenced other aggregates be loaded) by the repository? Or is the "reference" just an id and someone outside the entity (the commandhandler or whoever loads the aggregate from the repository and invokes a method) uses this id to load the other aggregate too and gives it then into the method as a parameter as in the following example?
agg1 = repo1.Load(id);
agg2 = repo2.Load(agg1.refId);
agg1.mymethod(agg2);
For 3. I think the only methods that should be called on the other aggregate would be query methods (in the cqs sense) that do not alter the other aggregate because only one aggregate per transaction should be changed. Right?

As for questions 1. and 2. what you said is fine and most of the time it's done that way. You reference other aggregate by ids and retrieve them outside of domain logic, in application service. Reason why you should not load other aggregates in aggregates (except for SRP violation) is that you have no control of what is going on, same with lazy loading. You can easly make code that will load the same aggregate from DB dozen of times when it could be loaded once. You can use cache, but there will be also problem with stale data etc.
However, sometimes, you need to do "Performance Driven Design" over "DDD", and then you load aggregates in another aggregates, but it is rare.
For your answer to 3. In Query from CQRS you don't even use repositories, nor aggregates, because you just want to get data, no domain logic is involved there.

How to retrieve Aggregate Roots that don't have repositories?

Eric Evan's DDD book, pg. 152:
Provide Repositories only for AGGREGATE roots that actually need
direct access.
1.
Should Aggregate Roots that don't need direct access be retrieved and saved via repositories of those Aggregate Roots that do need direct access?
For example, if we have Customer and Order Aggregate roots and if for whatever reason we don't need direct access to Order AR, then I assume only way orders can be obtained is by traversing Customer.Orders property?
2.
When should ICustomerRepository retrieve orders? When Customer AR is retrieved ( via ICustomerRepository.GetCustomer ) or when we traverse Customer.GetOrders property?
3.
Should ICustomerRepository itself retrieve orders or should it delegate this responsibility to a IOrderRepository? If the latter, then one option would be to inject IOrderRepository into ICustomerRepository. But since outside code shouldn't know that IOrderRepository even exists ( if outside code was aware of its existence, then it may also use IOrderRepository directly ), how then should ICustomerRepository get a reference to IOrderREpository?
UPDATE:
1
With regards to implementation, if done with an ORM like NHibernate,
there is no need for an IOrderRepository.
a) Are you saying that when using ORM, we usually don't need to implement repositories, since ORMs implicitly provide them?
b) I do plan on learning one of ORM technologies ( probably EF ), but from little I did read on ORMs, it seems that if you want to completely decouple Domain or Application layers from Persistence layer, then these two layers shouldn't use ORM expressions, which also implies that ORM expressions and POCOs should exist only within Repository implementations?
c) If there is a scenario where for some reason AR root doesn't have a direct access ( and project doesn't use ORM ), what would your answer to 3. be?
thanks

I'm hard-pressed to think of an example where an aggregate does not require direct access. However, I think at the time of writing (circa 2003), the emphasis on limiting or eliminating traversable object references between aggregates wasn't as prevalent as it is today. Therefore, it could have been the case that a Customer aggregate would reference a collection of Order aggregates. In this scenario, there may be no need to reference an Order directly because traversal from Customer is acceptable.
With regards to implementation, if done with an ORM like NHibernate, there is no need for an IOrderRepository. The Order aggregate would simply have a mapping. Additionally, the mapping for Customer would specify that changes should cascade down to corresponding Order aggregates.
When should ICustomerRepository retrieve orders?
This is the question which raises concern over traversable object references between aggregates. A solution provided by ORM is lazy loading, but lazy loading can be problematic. Ideally, a customer's orders would only be retrieved when needed and this depends on context. My suggestion, therefore, is to avoid traversable references between aggregates and use a repository search instead.
UPDATE
a) You would still need something that implements ICustomerRepository, but the implementation would be largely trivial if the mappings are configured - you'd delegate to the ORM's API to implement each repository method. No need for a IOrderRepository however.
b) For full encapsulation, the repository interface would not contain anything ORM-specific. The repository implementation would adapt the repository contract to ORM specifics.
c) Hard to make a judgement on a scenario I can't picture, but it would seem there is no need for Order repository interface, you can still have an Order Repository to better separate responsibilities. No need for injection either, just have the Customer repo implementation create an instance of Order repo.

Few confusing things about globally accessible Value Objects

Quotes are from DDD: Tackling Complexity in the Heart of Software ( pg. 150 )
a)
global search access to a VALUE is often meaningles, because finding a
VALUE by its properties would be equivalent to creating a new instance
with those properties. There are exceptions. For example, when I am
planning travel online, I sometimes save a few prospective itineraries
and return later to select one to book. Those itineraries are VALUES
(if there were two made up of the same flights, I would not care which
was which), but they have been associated with my user name and
retrieved for me intact.
I don't understand author's reasoning as for why it would be more appropriate to make Itinierary Value Object globally accessible instead of clients having to globally search for Customer root entity and then traverse from it to this Itinierary object?
b)
A subset of persistent objects must be globaly accessible through a
search based on object attributes ... They are usualy ENTITIES,
sometimes VALUE OBJECTS with complex internal structure ...
Why is it more common for Values Objects with complex internal structure to be globally accesible rather than simpler Value Objects?
c) Anyways, are there some general guidelines on how to determine whether a particular Value Object should be made globally accessible?
UPDATE:
a)
There is no domain reason to make an itinerary traverse-able through
the customer entity. Why load the customer entity if it isn't needed
for any behavior? Queries are usually best handled without
complicating the behavioral domain.
I'm probably wrong about this, but isn't it common that when user ( Ie Customer root entity ) logs in, domain model retrieves user's Customer Aggregate?
And if users have an option to book flights, then it would also be common for them to check from time to time the Itineraries ( though English isn't my first language so the term Itinerary may actually mean something a bit different than I think it means ) they have selected or booked.
And since Customer Aggregate is already retrieved from the DB, why issue another global search for Itinerary ( which will probably search for it in DB ) when it was already retrieved together with Customer Aggregate?
c)
The rule is quite simple IMO - if there is a need for it. It doesn't
depend on the structure of the VO itself but on whether an instance of
a particular VO is needed for a use case.
But this VO instance has to be related to some entity ( ie Itinerary is related to particular Customer ), else as the author pointed out, instead of searching for VO by its properties, we could simply create a new VO instance with those properties?
SECOND UPDATE:
a) From your link:
Another method for expressing relationships is with a repository.
When relationship is expressed via repository, do you implement a SalesOrder.LineItems property ( which I doubt, since you advise against entities calling repositories directly ), which in turns calls a repository, or do you implement something like SalesOrder.MyLineItems(IOrderRepository repo)? If the latter, then I assume there is no need for SalesOrder.LineItems property?
b)
The important thing to remember is that aggregates aren't meant to be
used for displaying data.
True that domain model doesn't care what upper layers will do with the data, but if not using DTO's between Application and UI layers, then I'd assume UI will extract the data to display from an aggregate ( assuming we sent to UI whole aggregate and not just some entity residing within it )?
Thank you

a) There is no domain reason to make an itinerary traverse-able through the customer entity. Why load the customer entity if it isn't needed for any behavior? Queries are usually best handled without complicating the behavioral domain.
b) I assume that his reasoning is that complex value objects are those that you want to query since you can't easily recreate them. This issue and all query related issues can be addressed with the read-model pattern.
c) The rule is quite simple IMO - if there is a need for it. It doesn't depend on the structure of the VO itself but on whether an instance of a particular VO is needed for a use case.
UPDATE
a) It is unlikely that a customer aggregate would have references to the customer's itineraries. The reason is that I don't see how an itinerary would be related to behaviors that would exist in the customer aggregate. It is also unnecessary to load the customer aggregate at all if all that is needed is some data to display. However, if you do load the aggregate and it does contain reference data that you need you may as well display it. The important thing to remember is that aggregates aren't meant to be used for displaying data.
c) The relationship between customer and itinerary could be expressed by a shared ID - each itinerary would have a customerId. This would allow lookup as required. However, just because these two things are related it does not mean that you need to traverse customer to get to the related entities or value objects for viewing purposes. More generally, associations can be implemented either as direct references or via repository search. There are trade-offs either way.
UPDATE 2
a) If implemented with a repository, there is no LineItems property - no direct references. Instead, to obtain a list of line items a repository is called.
b) Or you can create a DTO-like object, a read-model, which would be returned directly from the repository. The repository can in turn execute a simple SQL query to get all required data. This allows you to get to data that isn't part of the aggregate but is related. If an aggregate does have all the data needed for a view, then use that aggregate. But as soon as you have a need for more data that doesn't concern the aggregate, switch to a read-model.

Retrieval of child objects of aggregates in DDD

In DDD root of an aggregate is the only reference to retrieve its child objects. Repository of root of an aggregate is responsible for giving the root object reference only. If I need child objects then need to call a getter method of the aggregate to retrieve the child objects which results in a DB query.
Consider a case where I am retrieving multiple aggregates from DB. So in my case this situation results in multiple DB queries which leads a very slow request. How to avoid this in terms of DDD. For persisting I came across a pattern called Unit Of Work. Is there any pattern for the search which resolves my problem or any other way to do this.

First of all, 95% of problems are solved by your ORM (if you happen to use relational database).
Aggregate root repository should (in most cases) return a fully loaded object with all child objects (entities). Lazy loading children should be an exception, not a rule.
Another thing is, you should avoid loading and persisting multiple aggregates at a time. Try repartitioning you domain so that each user interaction deals with only one aggregate.
And consider a document database solution. It really makes sanes to store whole aggregates as documents in a doc database.

Okey it seems like you have a scenario where you in a single use case want to read from several AR and also savee their state into DB. Is the read operation taking to long? or is it both read and write that takes time?
Your domain model and Aggregate roots should be partly defined through interation from use cases. What I'm saying is that, the model should be designed so it suits your clients needs. This scenario seems not like one that fits your model well.
Reports or other operations that uses a large data view should be bypasses the domain model. Don't use DDD for reports etc. Just do a fast data access.
Second. Unit of work is one way to go, if you want all aggregates to participate in a transaction.
Third. I would say, Use Lazy loading, but some use cases that need performance boost you can do a loading strategy which means you let the root load some child collections without having sql-sub-selects firing...
look at this article http://weblogs.asp.net/fredriknormen/archive/2010/07/25/loading-strategy-for-entity-framework-4-0.aspx (even its for EF pattern works well for NH ORM)
Then at last you can always provide db indexes, caching etc to boost perfomance, but given the scenario info, you have takensome kind of wrong design desicion. I don't havee all the facts but maybe some use cases aren't suitable for

I find DDD excellent when it comes to any kind of write operation. For Querying data instead, it only poses unnecessary restrictions.
I would strongly recommend using CQRS as general architecture pattern. This would allow you to create specific Query Models for your Views and leave DDD for input validation and Command execution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string