I've been getting acquainted with DDD and trying to understand the way Entities and Aggregate Roots interact.
Below is the example of the situation:
Let's say there is a user and he/she has multiple email addresses (can have up to 200 for the sake of example). Each email address has it's own identity and so does the user. And there is one to many relationship between users and their email.
From the above example I consider Users and Emails as two entities while Users is the aggregate root
DDD Rules that I came across:
Rule: Only aggregate root has access to the repository.
Question 1: Does it mean that I cannot have a separate database table/collection to store the emails separately? Meaning that the emails have to be embedded inside the user document.
Rule: Entities outside the aggregate can only access other entities in the aggregate via the aggregate root.
Question 2: Now considering I do split them up into two different tables/collection and link the emails by having a field in email called associatedUserId that holds the reference to the user that email belongs to. I can't directly have an API endpoint like /users/{userId}/emails and handle it directly in the EmailService.getEmailsByUserId(String userId)? If not how do I model this?
I am sorry if the question seems a bit too naive but I can't seem to figure it out.

Only aggregate root has access to the repository
Does it mean that I cannot have a separate database table/collection to store the emails separately? Meaning that the emails have to be embedded inside the user document.
It means that there should be a single lock to acquire if you are going to make any changes to any of the member entities of the aggregate. That certainly means that the data representation of the aggregate is stored in a single database; but you could of course distribute the information across multiple tables in that database.
Back in 2003, using relational databases as the book of record was common; one to many relationships would normally involve multiple tables all within the same database.
Entities outside the aggregate can only access other entities in the aggregate via the aggregate root.
I can't directly have an API endpoint like /users/{userId}/emails and handle it directly in the EmailService.getEmailsByUserId(String userId)?
Of course you can; you'll do that by first loading the root entity of the User aggregate, then invoking methods on that entity to get at the information that you need.
A perspective: Evans was taking a position against the idea that the application should be able to manipulate arbitrary entities in the domain model directly. Instead, the application should only be allowed to the "root" entities in the domain model. The restriction, in effect, means that the application doesn't really need to understand the constraints that are shared by multiple entities.
Four or five years later cqrs appeared, further refining this idea -- it turns out that in read-only use cases, the domain model doesn't necessarily contribute very much; you don't need to worry about the invariants if they have already been satisfied and you aren't changing anything.
In effect, this suggests that GET /users/{userId}/emails can just pull the data out of a read-only view, without necessarily involving the domain model at all. But POST /users/{userId}/emails needs to demonstrate the original care (meaning, we need to modify the data via the domain model)
does this mean that I need to first go to the UserRepo and pull out the user and then pull out the emails, can't I just make a EmailService talking to an Email Repo directly
In the original text by Evans, repositories give access to root entities, rather than arbitrary entities. So if "email" is a an entity within the "user aggregate", then it normally wouldn't have a repository of its own.
Furthermore, if you find yourself fighting against that idea, it may be a "code smell" trying to bring you to recognize that your aggregate boundaries are in the wrong place. If email and user are in different aggregates, then of course you would use different repositories to get at them.
The trick is to recognize that aggregate design is a reflection of how we lock our data for modification, not how we link our data for reporting.


Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

Repository within domain objects

I have seen lot of discussions regarding this topic but i couldn't get a convincing answer. The general advice is not to have repository inside a domain object. What about an aggregate root? Isnt it right to give the root the responsibility to manipulate the composed objects?
For example, i have a microservice which takes care of invoices. Invoice is an aggregate root which has the different products. There is no requirement for this service to give details about individual products. I have 2 tables, one to store invoice details and other to store products of those invoices. I have two repositories corresponding to the tables. I have injected product repository inside the invoice domain object. Is it wrong to do so?
I see some mistakes according to DDD principles in your question. Let me try to clarify some concepts to give you hand.
First, you mentioned you have an Aggregate Root which is Invoice, and then two different repositories. Having an Aggregate Root means that any change on the Entities that the Aggregate consists of should be performed via the Aggregate Root. Why? That's because you need to satisfy some business rule (invariant) that applies on the relation of those Entities. For instance, given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If you have a BidsRepository, you could easily do:
var newBid = new Bid(money);
And you were saving a Bid without passing the defined business rule. However, having the repository just for the Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid. Afterwards you can save the info into as many tables as you want, that is an implementation detail.
Second, you said if it's wrong injecting the repository into a Domain class. Here a quick explanation:
The repository should depend on the object it returns, not the other way around. The reason for this is that your "domain object" (more on that later) can exist (and should be testable) without being loaded or saved (that is, having a dependency on a repository).
Basically your design says that in order to have an invoice, you need to provide a MySQL/Mongo/XXX instance connection which is an infrastructure detail. Your domain should not know anything about how it is persisted. Your domain knows about the behavior like in the scenario of the Auction and Bids.
These concepts just help you to create code easier to maintain as well as help you to apply best practices such as SRP (Single Responsibility Principle).
Yes, I think it is wrong.
Domain should match real business model and should not care how data is persisted. Even if data internally are stored in multiple tables, this should not affect domain objects in any way.
When you are loading aggregate root, you should load related entities as well in one go. For example, this can easily be achieved with Include keyword in Entity Framework if you are on .NET. By loading all the data you ensure that you have full representation of business entity at any given time and you don't have to query database anymore.
Any changes in related entities should be persisted together with aggregate root in one atomic operation (usually using transactions).

Implementing Udi's Fetching Strategy - How do I search?

Udi Dahan suggests a fetching strategy as a useful pattern to use for data access. I agree.
The concept is to make roles explicit. For example I have an Aggregate Root - Customer. I want customer in several parts of my application - a list of customers to select from, a view of the customer's details, and I want a button to deactivate a customer.
It seems Udi would suggest an interface for each of these roles. So I have ICustomerInList with very basic details, ICustomerDetail which includes the latest 10 products purchased, and IDeactivateCustomer which has a method to deactivate the customer. Each interface exposes just enough of my Customer Aggregate Root to get the job done in each situation. My Customer Aggregate Root implements all these interfaces.
Now I want to implement a fetching strategy for each of these roles. Each strategy can load a different amount of data into my Aggregate Root because it will be behind an interface exposing only the bits of information needed.
The general method to implement this part is to ask a Service Locator or some other style of dependency injection. This code will take the interface you are wanting, for example ICustomerInList, and find a fetching strategy to load it (IStrategyForFetching<ICustomerInList>). This strategy is implemented by a class that knows to only load a Customer with the bits of information needed for the ICustomerInList interface.
So far so good.
What you pass to the Service Locator, or the IStrategyForFetching<ICustomerInList>. All of the examples I see are only selecting one object by a known id. This case is easy, the calling code passes this id through and will get back the specific interface.
What if I want to search? Or I want page 2 of the list of customers? Now I want to pass in more terms that the Fetching Strategy needs.
Possible solutions
Some of the examples I've seen use a predicate - an expression that returns true or false if a particular Aggregate Root should be part of the result set. This works fine for conditions but what about getting back the first n customers and no more? Or getting page 2 of the search results? Or how the results are sorted?
My first reaction is to start adding generic parameters to my IStrategyForFetching<ICustomerInList> It now becomes IStrategyForFetching<TAggregateRoot, TStrategyForSelecting, TStrategyForOrdering>. This quickly becomes complex and ugly. It's further complicated by different repositories. Some repositories only supply data when using a particular strategy for selecting, some only certain types of ordering. I would like to have the flexibility to implement general repositories that can take sorting functions along with specialised repositories that only return Aggregate Roots sorted in a particular fashion.
It sounds like I should apply the same pattern used at the start - How do I make roles explicit? Should I implement a strategy for fetching X (Aggregate Root) using the payload Y (search / ordering parameters)?
Edit (2012-03-05)
This is all still valid if I'm not returning the Aggregate Root each time. If each interface is implemented by a different DTO I can still use IStrategyForFetching. This is why this pattern is powerful - what does the fetching and what is returned doesn't have to map in any way to the aggregate root.
I've ended up using IStrategyForFetching<TEntity, TSpecification>. TEntity is the thing I want to get, TSpecification is how I want to get it.
Have you come across CQRS? Udi is a big proponent of it, and its purpose is to solve this exact issue.
The concept in its most basic form is to separate the domain model from querying. This means that the domain model only comes into play when you want to execute a command / commit a transaction. You don't use data from your aggregates & entities to display information on the screen. Instead, you create a separate data access service (or bunch of them) that contain methods that provide the exact data required for each screen. These methods can accept criteria objects as parameters and therefore do searching with whatever criteria you desire.
A quick sequence of how this works:
A screen shows a list of customers that have made orders in the last week.
The UI calls the CustomerQueryService passing a date as criteria.
The CustomerQueryService executes a query that returns only the fields required for this screen, including the aggregate id of each customer.
The user chooses a customer in the list, and chooses perform the 'Make Important Customer' action /command.
The UI sends a MakeImportantCommand to the Command Service (or Application Service in DDD terms) containing the ID of the customer.
The command service fetches the Customer aggregate from the repository using the ID passed in the command, calls the necessary methods and updates the database.
Building your app using the CQRS architecture opens you up to lot of possibilities regarding performance and scalability. You can take this simple example further by creating separate query databases that contain denormalised tables for every view, eventual consistency & event sourcing. There is a lot of videos/examples/blogs about CQRS that I think would really interest you.
I know your question was regarding 'fetching strategy' but I notice that he wrote this article in 2007, and it's likely that he considers CQRS its sucessor.
To summarise my answer:
Don't try and project cut down DTO's from your domain aggregates. Instead, just create separate query services that give you a tailored query for your needs.
Read up on CQRS (if you haven't already).
To add to the response by David Masters, I think all the fetching strategy interfaces are adding needless complexity. Having the Customer AR implement the various interfaces which are modeled after a UI is a needless constraint on the AR class and you will spend far to much effort trying to enforce it. Moreover, it is a brittle solution. What if a view requires data that while related to Customer, does not belong on the customer class? Does one then coerce the customer class and the corresponding ORM mappings to contain that data? Why not just have a separate set of classes for query purposes and be done with it? This allows you to deal with fetching strategies at the place where they belong - in the repository. Furthermore, what value does the fetching strategy interface abstraction really add? It may be an appropriate model of what is happening in the application, it doesn't help in implementing it.

Should the implementation of repositories be isolated like their coresponding aggregates?

The benifit of having repositories when using DDD is that they allows one to design a domain model without worrying about how objects will be persisted. It also allows the final product to be more flexible, as different implementations of repositories can be swapped in and out easily. So it's possible for the implementation of repositories to be based on SQL databases, REST web services, XML files, or any other method of storing and retrieving data. From the model's perspective the expectation is that there are just these magic collections that can be use to store and retrieve aggregate roots objects.
Now if I have two normal in-memory collections, say an IList<Order> and an IList<Customer>, I would never expect that modifying one collection would affect the other. So should the same logic apply to repositories? Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
For example a cascade-on-delete relationship may be setup in a SQL database between a Customers table and an Orders table so that corresponding orders are deleted when a customer is deleted. Yet this functionality would break if later the SQLCustomerRepository is replaced by a RESTCustomerRepository.
So am I correct in thinking that the model should always be under the assumption that repositories are totally isolated from one another, and correspondingly the actual implementation of repositories should be isolated as well?
So if Orders should be deleted when a Customer is deleted should this be defined explicitly in the domain model, rather then relying on the database? Say through a CustomerService.DeleteCustomer() method which accesses the current ICustomerRepository and IOrderRepository.
I think I am just having a hard time getting my head out of the relational world and into the DDD world. I keep wanting to think of things in terms of tables and PK/FK relationships, where I should just ignore that a database is involved at all.
I believe that point you miss is that aggregate roots draws context boundaries.
In simple words - stuff underneath makes sense only together w/ aggregate root itself.
As I see it - Order is not an aggregate root but an entity which lives in Customer aggregate root context. That means - there is no need for Order repository because repositories are supposed to be per aggregate root. So there should be only CustomerRepository which is supposed to know how to persist Customer.Orders too.
I myself don't worry that much and omit repository pattern altogether and just rely on NHibernate ORM. Rich domain model that correctly tracks and monitors state changes is much more important than way how you actually send update/select sql statements.
Also - think twice before deleting stuff.
Never delete a customer, a customer is not deleted, it is made inactive or something. Also please don't cascade delete orders it will get you into strange places, orders should always be preserved when they are processed. Think of reports for your application, so 1.1 Million revenue just went away because you decided to cascade delete.
You have a repository per aggregate root not per entity, thus even cascading deletion of childs of aggregate root is applicable in the aggregate root repository as it is still isolated.
Dont cascade deletion or have any side effects to other aggregate roots, co-ordinate this logic in the application layer.
Your domain model should model the transactional operations of your domain. By putting Orders on Customer, in your Customer entity, you are saying that when a Customer is deleted, so should his Orders.
If you have OrderIds on your Customer, that's different. Than you have an association between Customer and Orders. In this case, you are saying that by adding or removing from the list of OrderIds on Customers, you are adding or removing associations, not adding or deleting Orders.
Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
Yes, for the most part. If you decide to make both Order and Customer Aggregate Roots, you are saying they are independant of one another, and should be allowed to change independently and simultaneously. That is, you don't need the changes to be transactional between the two. If you only make Customer an Aggregate Root, and have it have a list of Orders, now you are saying that the Customer entity dictates what happens to the Orders, and changing a Customer will cascade changes to it's Orders.
Now in your example, it seems you'd have Customers as aggregate roots. And Orders as aggregate roots. Each with their own repo. Customers would have a list of OrderIds to model the one to many association. If you deleted a Customer, you could publish a customer deleted event, and have everything related to this customer clean itself up.

In domain driven design, can entities have their own repositories?

I'm working a pretty standard e-commerce web site where there are Products and Categories. Each product has an associated category, which is a simple name-value pair object used to categorise a product (e.g. item 1234 may have a category "ballon").
I modelled the product as a root aggregate, which owns and knows how to modify it's category, which is an entity.
However, I ran into a problem where a user needs to be able to search a category. How am I supposed to implement this in DDD? I'm new to DDD but I believe that only root aggregates should be given it's own repository. So that leaves me with 2 options:
Add "SearchCategory" method to the ProductRepository
Implement the search logic as service (i.e. CategoryFinderService)
I personally think option 2 is more logical but it feels weird to have a service that touches database. Somehow I feel that only repository should be allowed to interact with database.
Can someone please tell me what's the best way to implement this?
IMHO, in your Domain Model, Category should not be child of the Product Aggregation. The Product has a Category, but it does not know how to create or edit a Category.
Take this another example. Imagine the ShoppingCart class, it's an aggregate root and contains a list of Items. The ShoppingCart is responsible for adding/editing/removing the Items, in this case you won't need a Repository for the Item class.
Not sure by the way, I'm new to this just like you.
Placing something You don't know where to put into artificial services usually leads to anemic domain model.
I would go with first option. But need for entities without context of root is a sign that You might lack another root.
Don't try to implement everything with your domain model. The domain model is powerful for changing the state of the system, but unnecessary complex for querying. So separate the two. It's called Command Query Responsibility Segregation, or CQRS. And no, it has nothing to do with Event Sourcing, even though they do work nicely together.
I implement scenarios such as this so that I have a domain logic side with the domain objects and repositories (if needed), which do the state changing when something happens, i.e. new order is placed or order is shipped. But when I just need to show something in the UI, for instance the list of the products filtered by the category, it is a simple query and does not involve the domain objects at all. It simply returns Data Transfer Objects (DTO) that do not contain any domain logic at all.
