How should I enforce relationships and constraints between aggregate roots?

I have a couple questions regarding the relationship between references between two aggregate roots in a DDD model. Refer to the typical Customer/Order model diagrammed below.
First, should references between the actual object implementation of aggregates always be done through ID values and not object references? For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct? I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots? For example say I want all the associated orders to be deleted when a customer is deleted. The ICustomerRepository.DeleteCustomer() method should not be referencing the IOrderRepostiory should it? That seems like that would be breaking the boundaries between the aggregates/repositories? Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?

First, should references between aggregates always be done through ID values and not actual object references?
Not really - though some would make that change for performance reasons.
For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct?
Generally, you'd model 1 side of the relationship (eg., Customer.Orders or Order.Customer) for traversal. The other can be fetched from the appropriate Repository (eg., CustomerRepository.GetCustomerFor(Order) or OrderRepository.GetOrdersFor(Customer)).
Wouldn't that mean that the OrderRepository would have to know something about how to create a Customer? Wouldn't that be beyond what OrderRepository should be responsible for...
The OrderRepository would know how to use an ICustomerRepository.FindById(int). You can inject the ICustomerRepository. Some may be uncomfortable with that, and choose to put it into a service layer - but I think that's overkill. There's no particular reason repositories can't know about and use each other.
I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Aggregate roots are allowed to hold references to other aggregate roots. In fact, anything is allowed to hold a reference to an aggregate root. An aggregate root cannot hold a reference to a non-aggregate root entity that doesn't belong to it, though.
Eg., Customer cannot hold a reference to OrderLines - since OrderLines properly belongs as an entity on the Order aggregate root.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots?
If (and I stress if, because it's a peculiar requirement) that's actually a use case, it's an indication that Customer should be your sole aggregate root. In most real-world systems, however, we wouldn't actually delete a Customer that has associated Orders - we may deactivate them, move their Orders to a merged Customer, etc. - but not out and out delete the Orders.
That being said, while I don't think it's pure-DDD, most folks will allow some leniency in following a unit of work pattern where you delete the Orders and then the Customer (which would fail if Orders still existed). You could even have the CustomerRepository do the work, if you like (though I'd prefer to make it more explicit myself). It's also acceptable to allow the orphaned Orders to be cleaned up later (or not). The use case makes all the difference here.
Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
I probably wouldn't go a service route for something so intimately tied to the repository. As for how to make sure a service is just don't put a public Delete on the CustomerRepository. Or, you throw an error if deleting a Customer would leave orphaned Orders.

Another option would be to have a ValueObject describing the association between the Order and the Customer ARs, VO which will contain the CustomerId and additional information you might need - name,address etc (something like ClientInfo or CustomerData).
This has several advantages:
Your ARs are decoupled - and now can be partitioned, stored as event streams etc.
In the Order ARs you usually need to keep the information you had about the customer at the time of the order creation and not reflect on it any future changes made to the customer.
In almost all the cases the information in the value object will be enough to perform the read operations ( display customer info with the order ).
To handle the Deletion/deactivation of a Customer you have the freedom to chose any behavior you like. You can use DomainEvents and publish a CustomerDeleted event for which you can have a handler that moves the Orders to an archive, or deletes them or whatever you need. You can also perform more than one operation on that event.
If for whatever reason DomainEvents are not your choice you can have the Delete operation implemented as a service operation and not as a repository operation and use a UOW to perform the operations on both ARs.
I have seen a lot of problems like this when trying to do DDD and i think that the source of the problems is that developers/modelers have a tendency to think in DB terms. You ( we :) ) have a natural tendency to remove redundancy and normalize the domain model. Once you get over it and allow your model to evolve and implicate the domain expert(s) in it's evolution you will see that it's not that complicated and it's quite natural.
UPDATE: and a similar VO - OrderInfo can be placed inside the Customer AR if needed, with only the needed information - order total, order items count etc.


Repository within domain objects

I have seen lot of discussions regarding this topic but i couldn't get a convincing answer. The general advice is not to have repository inside a domain object. What about an aggregate root? Isnt it right to give the root the responsibility to manipulate the composed objects?
For example, i have a microservice which takes care of invoices. Invoice is an aggregate root which has the different products. There is no requirement for this service to give details about individual products. I have 2 tables, one to store invoice details and other to store products of those invoices. I have two repositories corresponding to the tables. I have injected product repository inside the invoice domain object. Is it wrong to do so?
I see some mistakes according to DDD principles in your question. Let me try to clarify some concepts to give you hand.
First, you mentioned you have an Aggregate Root which is Invoice, and then two different repositories. Having an Aggregate Root means that any change on the Entities that the Aggregate consists of should be performed via the Aggregate Root. Why? That's because you need to satisfy some business rule (invariant) that applies on the relation of those Entities. For instance, given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If you have a BidsRepository, you could easily do:
var newBid = new Bid(money);
And you were saving a Bid without passing the defined business rule. However, having the repository just for the Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid. Afterwards you can save the info into as many tables as you want, that is an implementation detail.
Second, you said if it's wrong injecting the repository into a Domain class. Here a quick explanation:
The repository should depend on the object it returns, not the other way around. The reason for this is that your "domain object" (more on that later) can exist (and should be testable) without being loaded or saved (that is, having a dependency on a repository).
Basically your design says that in order to have an invoice, you need to provide a MySQL/Mongo/XXX instance connection which is an infrastructure detail. Your domain should not know anything about how it is persisted. Your domain knows about the behavior like in the scenario of the Auction and Bids.
These concepts just help you to create code easier to maintain as well as help you to apply best practices such as SRP (Single Responsibility Principle).
Yes, I think it is wrong.
Domain should match real business model and should not care how data is persisted. Even if data internally are stored in multiple tables, this should not affect domain objects in any way.
When you are loading aggregate root, you should load related entities as well in one go. For example, this can easily be achieved with Include keyword in Entity Framework if you are on .NET. By loading all the data you ensure that you have full representation of business entity at any given time and you don't have to query database anymore.
Any changes in related entities should be persisted together with aggregate root in one atomic operation (usually using transactions).

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

How to model sort order for many-to-one across two aggreagate roots

Take the domain proposed in Effective Aggregate Design of a Product which has multiple Releases. In this article, Vaughn arrives at the conclusion that both the Product and Release should each be their own aggregate roots.
Now suppose that we add a feature
As a release manager I would like to be able to sort releases so that I can create timelines for rolling out larger epics to our users
I'm not a PM with a specific need but it seems reasonable that they would want the ability to sort releases in the UI.
I'm not exactly sure how this should work. Its natural for each Release to have an order property but re-ordering would involve changing multiple aggregates on the same transaction. On the other hand, if that information is stored in the Product aggregate you have to have a method like product.setRelaseOrder(ReleaseId[]) which seems like a weird bit of data to store at a completely different place than Releases. Worse, adding a release would again involve modification on two different aggregates! What else can we do? ProductReleaseSortOrder can be its own aggregate, but that sounds downright absurd!
So what to do? At the moment I'm still leaning toward the let-product-manage-it option but what's correct here?
I have found that in fact it is best to create a new aggregate root (e.g., ProductReleaseSorting as suggested) for each individual sorting and/or ordering purposes.
This is because releaseOrder clearly is not actually a property of the Product, i.e., something that has a meaning on a product on its own. Rather, it is actually a property of a "view" on a collection of products, and this view should be modeled on its own.
The reason why I tend to introduce a new aggregate root for each individual view on a collection of items becomes clear if you think of what happens if you were to introduce additional orderings in the future, say a "marketing order", or multiple product managers want to keep their own ordering etc. Here, one easily sees that "marketing order" and "release order" are two different concepts that should be treated independently, and if multiple persons want to order the products with the same key, but using different orderings, you'll need individual "per person views". Furthermore, it could be that there are multiple order criteria that one would like to take into account when sorting (an example for the latter would be (in a different context) fastest route vs. shortest route), all of which depends on the view you have on the collection, and not on individual properties of its items.
If you now handle the Product Manager's sorting in a ProductReleaseSorting aggregate, you
have a single source of truth support for the ordering (the AR),
the ProductReleaseSorting AR can enforce constraints such as that no two products have the same order number, and you
don't face the issue of having to update multiple ARs in a single transaction when changing the order.
Note that your ProductReleaseSorting aggregate most probably has a unique identity ("Singleton") in your domain, i.e., all product managers share the same sorting. If however all team members would like to have their own ProductReleaseSorting, it's trivial to support this by giving the ProductReleaseSorting a corresponding ID. Similarly, a more generic ProductSorting can be fetched by a per-team ID (marketing vs. product management) from the repository. All of this is easy with a new, separate aggregate root for ordering purposes, but hard if you add properties to the underlying items/entities.
So, Product and Release are both ARs. Release has an association to Product via AggregateId. You want to get list of all releasesfor a given product ordered by something?
Since ordering is an attribute of aggregate, then it should be set on Product, but Releases are ARs too and you shouldn't access repository of Release in Product AR (every AR should have its own repository).
I would simply make a ReleaseQueryService that takes productId and order parameter and call ReleaseRepository.loadOrderedReleasesForProduct(productId, order).
I would also think about separating contexts, maybe model for release presentation should be in another context? In example additional AR ProductReleases that would be used only for querying.

Few confusing things about globally accessible Value Objects

Quotes are from DDD: Tackling Complexity in the Heart of Software ( pg. 150 )
global search access to a VALUE is often meaningles, because finding a
VALUE by its properties would be equivalent to creating a new instance
with those properties. There are exceptions. For example, when I am
planning travel online, I sometimes save a few prospective itineraries
and return later to select one to book. Those itineraries are VALUES
(if there were two made up of the same flights, I would not care which
was which), but they have been associated with my user name and
retrieved for me intact.
I don't understand author's reasoning as for why it would be more appropriate to make Itinierary Value Object globally accessible instead of clients having to globally search for Customer root entity and then traverse from it to this Itinierary object?
A subset of persistent objects must be globaly accessible through a
search based on object attributes ... They are usualy ENTITIES,
sometimes VALUE OBJECTS with complex internal structure ...
Why is it more common for Values Objects with complex internal structure to be globally accesible rather than simpler Value Objects?
c) Anyways, are there some general guidelines on how to determine whether a particular Value Object should be made globally accessible?
There is no domain reason to make an itinerary traverse-able through
the customer entity. Why load the customer entity if it isn't needed
for any behavior? Queries are usually best handled without
complicating the behavioral domain.
I'm probably wrong about this, but isn't it common that when user ( Ie Customer root entity ) logs in, domain model retrieves user's Customer Aggregate?
And if users have an option to book flights, then it would also be common for them to check from time to time the Itineraries ( though English isn't my first language so the term Itinerary may actually mean something a bit different than I think it means ) they have selected or booked.
And since Customer Aggregate is already retrieved from the DB, why issue another global search for Itinerary ( which will probably search for it in DB ) when it was already retrieved together with Customer Aggregate?
The rule is quite simple IMO - if there is a need for it. It doesn't
depend on the structure of the VO itself but on whether an instance of
a particular VO is needed for a use case.
But this VO instance has to be related to some entity ( ie Itinerary is related to particular Customer ), else as the author pointed out, instead of searching for VO by its properties, we could simply create a new VO instance with those properties?
a) From your link:
Another method for expressing relationships is with a repository.
When relationship is expressed via repository, do you implement a SalesOrder.LineItems property ( which I doubt, since you advise against entities calling repositories directly ), which in turns calls a repository, or do you implement something like SalesOrder.MyLineItems(IOrderRepository repo)? If the latter, then I assume there is no need for SalesOrder.LineItems property?
The important thing to remember is that aggregates aren't meant to be
used for displaying data.
True that domain model doesn't care what upper layers will do with the data, but if not using DTO's between Application and UI layers, then I'd assume UI will extract the data to display from an aggregate ( assuming we sent to UI whole aggregate and not just some entity residing within it )?
Thank you
a) There is no domain reason to make an itinerary traverse-able through the customer entity. Why load the customer entity if it isn't needed for any behavior? Queries are usually best handled without complicating the behavioral domain.
b) I assume that his reasoning is that complex value objects are those that you want to query since you can't easily recreate them. This issue and all query related issues can be addressed with the read-model pattern.
c) The rule is quite simple IMO - if there is a need for it. It doesn't depend on the structure of the VO itself but on whether an instance of a particular VO is needed for a use case.
a) It is unlikely that a customer aggregate would have references to the customer's itineraries. The reason is that I don't see how an itinerary would be related to behaviors that would exist in the customer aggregate. It is also unnecessary to load the customer aggregate at all if all that is needed is some data to display. However, if you do load the aggregate and it does contain reference data that you need you may as well display it. The important thing to remember is that aggregates aren't meant to be used for displaying data.
c) The relationship between customer and itinerary could be expressed by a shared ID - each itinerary would have a customerId. This would allow lookup as required. However, just because these two things are related it does not mean that you need to traverse customer to get to the related entities or value objects for viewing purposes. More generally, associations can be implemented either as direct references or via repository search. There are trade-offs either way.
a) If implemented with a repository, there is no LineItems property - no direct references. Instead, to obtain a list of line items a repository is called.
b) Or you can create a DTO-like object, a read-model, which would be returned directly from the repository. The repository can in turn execute a simple SQL query to get all required data. This allows you to get to data that isn't part of the aggregate but is related. If an aggregate does have all the data needed for a view, then use that aggregate. But as soon as you have a need for more data that doesn't concern the aggregate, switch to a read-model.

Should the implementation of repositories be isolated like their coresponding aggregates?

The benifit of having repositories when using DDD is that they allows one to design a domain model without worrying about how objects will be persisted. It also allows the final product to be more flexible, as different implementations of repositories can be swapped in and out easily. So it's possible for the implementation of repositories to be based on SQL databases, REST web services, XML files, or any other method of storing and retrieving data. From the model's perspective the expectation is that there are just these magic collections that can be use to store and retrieve aggregate roots objects.
Now if I have two normal in-memory collections, say an IList<Order> and an IList<Customer>, I would never expect that modifying one collection would affect the other. So should the same logic apply to repositories? Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
For example a cascade-on-delete relationship may be setup in a SQL database between a Customers table and an Orders table so that corresponding orders are deleted when a customer is deleted. Yet this functionality would break if later the SQLCustomerRepository is replaced by a RESTCustomerRepository.
So am I correct in thinking that the model should always be under the assumption that repositories are totally isolated from one another, and correspondingly the actual implementation of repositories should be isolated as well?
So if Orders should be deleted when a Customer is deleted should this be defined explicitly in the domain model, rather then relying on the database? Say through a CustomerService.DeleteCustomer() method which accesses the current ICustomerRepository and IOrderRepository.
I think I am just having a hard time getting my head out of the relational world and into the DDD world. I keep wanting to think of things in terms of tables and PK/FK relationships, where I should just ignore that a database is involved at all.
I believe that point you miss is that aggregate roots draws context boundaries.
In simple words - stuff underneath makes sense only together w/ aggregate root itself.
As I see it - Order is not an aggregate root but an entity which lives in Customer aggregate root context. That means - there is no need for Order repository because repositories are supposed to be per aggregate root. So there should be only CustomerRepository which is supposed to know how to persist Customer.Orders too.
I myself don't worry that much and omit repository pattern altogether and just rely on NHibernate ORM. Rich domain model that correctly tracks and monitors state changes is much more important than way how you actually send update/select sql statements.
Also - think twice before deleting stuff.
Never delete a customer, a customer is not deleted, it is made inactive or something. Also please don't cascade delete orders it will get you into strange places, orders should always be preserved when they are processed. Think of reports for your application, so 1.1 Million revenue just went away because you decided to cascade delete.
You have a repository per aggregate root not per entity, thus even cascading deletion of childs of aggregate root is applicable in the aggregate root repository as it is still isolated.
Dont cascade deletion or have any side effects to other aggregate roots, co-ordinate this logic in the application layer.
Your domain model should model the transactional operations of your domain. By putting Orders on Customer, in your Customer entity, you are saying that when a Customer is deleted, so should his Orders.
If you have OrderIds on your Customer, that's different. Than you have an association between Customer and Orders. In this case, you are saying that by adding or removing from the list of OrderIds on Customers, you are adding or removing associations, not adding or deleting Orders.
Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
Yes, for the most part. If you decide to make both Order and Customer Aggregate Roots, you are saying they are independant of one another, and should be allowed to change independently and simultaneously. That is, you don't need the changes to be transactional between the two. If you only make Customer an Aggregate Root, and have it have a list of Orders, now you are saying that the Customer entity dictates what happens to the Orders, and changing a Customer will cascade changes to it's Orders.
Now in your example, it seems you'd have Customers as aggregate roots. And Orders as aggregate roots. Each with their own repo. Customers would have a list of OrderIds to model the one to many association. If you deleted a Customer, you could publish a customer deleted event, and have everything related to this customer clean itself up.
