Turning a CosmosDb Document into a DDD Aggregate - domain-driven-design

I have a CosmosDB document that models something in my problem space -- a Car for our purposes. It currently has a bunch of properties relating to the model, color, year manufactured, etc. I would like to treat the Car as a DDD Aggregate, including public methods for mutating the state of the object and for delegating methods calls to other objects referenced directly by the Aggregate (within the same document). I'm aware that in a better DDD implementation I would have data model(s) distinct from the domain model(s) with mapping functions between them, but its been a hard enough sell to treat the document as a full fledged Aggregate. The preferred direction by the team is to treat the document in an anemic fashion, with Aggregate methods appearing in the Application Service, which makes testing of Aggregate logic more difficult. Is there any downside to including the Aggregate logic directly in the document?

Storing aggregates or utilizing Materialized Views in documents on domain data in Cosmos DB is quite common as it reduces or eliminates the need for frequently run, but often expensive queries.
For instance, in a simple ecommerce scenario, it is more efficient to have the order total in the order header and fetch via point read ReadItemAsync() rather than doing a query to fetch and Sum all the order items. Another scenario would be where you need to keep a running total for sales in all categories for the day. In this scenario you have a single document that has categories with any sales for the day. As each order occurs, the insert operation for each item in the cart triggers Change Feed which does a point read on the aggregate document and upserts the category total, incrementing it by the new sold item. Then instead of querying all sales for the day which gets progressively more expensive as orders grow, you simply issue a point read to get the totals which would be a 1 RU operation.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

How do I read all entities of a kind in a transaction with google cloud datastore nodejs

When I try run a query to read all entities of a kind in a transaction with google datastore it gives me this error
{ Error: Only ancestor queries are allowed inside transactions.
at /root/src/node_modules/grpc/src/client.js:554:15
code: 3,
metadata: Metadata { _internal_repr: {} },
So I need to use an ancestor query. How do I create an ancestor query? It appears to depend on how you structured the hierarchy in datastore. So my next question is, given every entity I have created in datastore has been saved like so (the identifier is unique to the entityData saved)
const entityKey = datastore.key({ namespace: ns, path: [kind, identifier] });
{ key: entityKey, method: 'upsert', data: entityData };
How do I read from the db within a transaction? I think I could do it if I knew the identifiers, but the identifiers are constructed from the entityData that I saved in the kind and I need to read the entities of the kind to figure out what I have in the db (chicken egg problem). I am hoping I am missing something.
More context
The domain of my problem involves sponsoring people. I have stored a kind people in datastore where each entity is a person consisting of a unique identifier, name and grade. I have another kind called relationships where each entity is a relationship containing two of the peoples identifiers, the sponsor & sponsee (linking to people together). So I have structured it like an RDB. If I want to get a persons sponsor, I get all the relationships from the db, loop over them returning the relationships where the person is the sponsee then query the db for the sponsor of that relationship.
How do I structure it the 'datastore' way, with entity groups/ancestors, given I have to model people and their links/relationships.
Let's assume a RDB is out of the question.
Example scenario
Two people have to be deleted from the app/db (let's say they left the company on the same day). When I delete someone, I also want to remove their relationships. The two people I delete share a relationship (one is sponsoring the other). Assume the first transaction is successful i.e. I delete one person and their relationship. Next transaction, I delete one person, then search the relationships for relevant relationships and I find one that has already been deleted because eventually consistent. I try find the person for that relationship and they don't exist. Blows up.
Note: each transaction wraps delete person & their relationship. Multiple people equals multiple transactions.
Scalability is not a concern for my application
Your understanding is correct:
you can't use an ancestor query since your entities are not in an ancestry relationship (i.e. not in the same entity group).
you can't perform non-ancestor queries inside transactions. Note that you also can't read more than 25 of your entities inside a single transaction (each entity is in a separate entity group). From Restrictions on queries:
Queries inside transactions must be ancestor queries
Cloud Datastore transactions operate on entities belonging to up
to 25 entity groups, but queries inside transactions must be
ancestor queries. All queries performed within a transaction must
specify an ancestor. For more information, refer to Datastore
Transactions.
The typical approach in a context similar to yours is to perform queries outside transactions, often just keys only queries - to obtain the entity keys, then read the corresponding entities (up to 25 at a time) by key lookup inside transactions. And use transactions only when it's absolutely needed, see, for example, this related discussion: Ancestor relation in datastore.
Your question apparently suggests you're approaching the datastore with a relational DB mindset. If your app fundamentally needs relational data (you didn't describe what you're trying to do) the datastore might not be the best product for it. See Choosing a storage option. I'm not saying that you can't use the datastore with relational data, it can still be done in many cases, but with a bit more careful design - those restrictions are driving towards scalable datastore-based apps (IMHO potentially much more scalable that you can achieve with relational DBs)
There is a difference between structuring the data RDB style (which is OK with the datastore) and using it in RDB style (which is not that good).
In the particular usage scenario you mentioned you do not need to query for the sponsor of a relationship: you already have the sponsor's key in the relationship entity, all you need to do is look it up by key, which can be done in a transaction.
Getting all relationship entities for a person needs a query, filtered by the person being the sponsor or the sponsee. But does it really have to be done in a transaction? Or is it acceptable if maybe you miss in the result list a relationship created just seconds ago? Or having one which was recently deleted? It will eventually (dis)appear in the list if you repeat the query a bit later (see Eventual Consistency on Reading an Index). If that's acceptable (IMHO it is, relationships don't change that often, chances of querying exactly right after a change are rather slim) then you don't need to make the query inside a transaction thus you don't need an ancestry relationship between the people and relationship entities. Great for scalability.
Another consideration: looping through the list of relationship entities: also doesn't necessarily have to be done in a transaction. And, if the number of relationships is large, the loop can hit the request deadline. A more scalable approach is to use query cursors and split the work across multiple tasks/requests, each handling a subset of the list. See a Python example of such approach: How to delete all the entries from google datastore?
For each person deletion case:
add something like a being_deleted property (in a transaction) to that person to flag the deletion and prevent any use during deletion, like creating new relationship while the deletion task is progressing. Add checks for this flag wherever needed in the app's logic (also in transactions).
get the list of all relationship keys for that person and delete them, using the looping technique mentioned above
in the last loop iteration, when there are no relationships left, enqueue another task, generously delayed, to re-check for any recent relationships that might have been missed in the previous loop execution due to the eventual consistency. If any shows up re-run the loop, otherwise just delete the person
If scalability is not a concern, you can also re-design you data structures to use ancestry between all your entities (placing them in the same entity group) and then you could do what you want. See, for example, What would be the purpose of putting all datastore entities in a single group?. But there are many potential risks to be aware of, for example:
max rate of 1 write/sec across the entire entity group (up to 500 entities each), see Datastore: Multiple writes against an entity group inside a transaction exceeds write limit?
large transactions taking too long and hitting the request deadlines, see Dealing with DeadlineExceededErrors
higher risk of contention, see Contention problems in Google App Engine

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

DDD: do I really need to load all objects in an aggregate? (Performance concerns)

In DDD, a repository loads an entire aggregate - we either load all of it or none of it. This also means that should avoid lazy loading.
My concern is performance-wise. What if this results in loading into memory thousands of objects? For example, an aggregate for Customer comes back with ten thousand Orders.
In this sort of cases, could it mean that I need to redesign and re-think my aggregates? Does DDD offer suggestions regarding this issue?
Take a look at this Effective Aggregate Design series of three articles from Vernon. I found them quite useful to understand when and how you can design smaller aggregates rather than a large-cluster aggregate.
EDIT
I would like to give a couple of examples to improve my previous answer, feel free to share your thoughts about them.
First, a quick definition about an Aggregate (took from Patterns, Principles and Practices of Domain Driven Design book by Scott Millet)
Entities and Value Objects collaborate to form complex relationships that meet invariants within the domain model. When dealing with large interconnected associations of objects, it is often difficult to ensure consistency and concurrency when performing actions against domain objects. Domain-Driven Design has the Aggregate pattern to ensure consistency and to define transactional concurrency boundaries for object graphs. Large models are split by invariants and grouped into aggregates of entities and value objects that are treated as conceptual whole.
Let's go with an example to see the definition in practice.
Simple Example
The first example shows how defining an Aggregate Root helps to ensure consistency when performing actions against domain objects.
Given the next business rule:
Winning auction bids must always be placed before the auction ends. If a winning bid is placed after an auction ends, the domain is in an invalid state because an invariant has been broken and the model has failed to correctly apply domain rules.
Here there is an aggregate consisting of Auction and Bids where the Auction is the Aggregate Root.
If we say that Bid is also a separated Aggregate Root you would have have a BidsRepository, and you could easily do:
var newBid = new Bid(money);
BidsRepository->save(auctionId, newBid);
And you were saving a Bid without passing the defined business rule. However, having the Auction as the only Aggregate Root you are enforcing your design because you need to do something like:
var newBid = new Bid(money);
auction.placeBid(newBid);
auctionRepository.save(auction);
Therefore, you can check your invariant within the method placeBid and nobody can skip it if they want to place a new Bid.
Here it is pretty clear that the state of a Bid depends on the state of an Auction.
Complex Example
Back to your example of Orders being associated to a Customer, looks like there are not invariants that make us define a huge aggregate consisting of a Customer and all her Orders, we can just keep the relation between both entities thru an identifier reference. By doing this, we avoid loading all the Orders when fetching a Customer as well as we mitigate concurrency problems.
But, say that now business defines the next invariant:
We want to provide Customers with a pocket so they can charge it with money to buy products. Therefore, if a Customer now wants to buy a product, it needs to have enough money to do it.
Said so, pocket is a VO inside the Customer Aggregate Root. It seems now that having two separated Aggregate Roots, one for Customer and another one for Order is not the best to satisfy the new invariant because we could save a new order without checking the rule. Looks like we are forced to consider Customer as the root. That is going to affect our performance, scalaibility and concurrency issues, etc.
Solution? Eventual Consistency. What if we allow the customer to buy the product? that is, having an Aggregate Root for Orders so we create the order and save it:
var newOrder = new Order(customerId, ...);
orderRepository.save(newOrder);
we publish an event when the order is created and then we check asynchronously if the customer has enough funds:
class OrderWasCreatedListener:
var customer = customerRepository.findOfId(event.customerId);
var order = orderRepository.findOfId(event.orderId);
customer.placeOrder(order); //Check business rules
customerRepository.save(customer);
If everything was good, we have satisfied our invariants while keeping our design as we wanted at the beginning modifying just one Aggregate Root per request. Otherwise, we will send an email to the customer telling her about the insufficient funds issue. We can take advance of it by adding to the email alternatives options she can purchase with her current budget as well as encourage her to charge the pocket.
Take into account that the UI can help us to avoid having customers paying without enough money, but we cannot blindly trust on the UI.
Hope you find both examples useful, and let me know if you find better solutions for the exposed scenarios :-)
In this sort of cases, could it mean that I need to redesign and re-think my aggregates?
Almost certainly.
The driver for aggregate design isn't structure, but behavior. We don't care that "a user has thousands of orders". What we care about are what pieces of state need to be checked when you try to process a change - what data do you need to load to know if a change is valid.
Typically, you'll come to realize that changing an order doesn't (or shouldn't) depend on the state of other orders in the system, which is a good indication that two different orders should not be part of the same aggregate.

DDD: Confusion about repository/domain boundaries

My domain consists of Products, Departments, Classes, Manufacturers, DailySales, HourlySales.
I have a ProductRepository which facilitates storing/retrieving products from storage.
I have a DepartmentAndClass repository which facilitates storing/retrieving of departments and classes, as well as adding and removing products from those departments and classes.
I also have a DailySales repository which I use to retrieve statistics about daily sales from multiple groupings. ie..
DailySales.GetSalesByDepartment(dateTime)
DailySales.GetSalesByClass(dateTime)
DailySales.GetSalesByHour(dateTime)
Is it correct to have these sales tracking methods in their own repository like this? Am I on the right track?
Since domains are so dependent on context some answers are harder than others. I would, however, place statistics on the Query side of things. You probably do not want to be calculating those stats on the fly as you will be placing some heavy processing on your database. Typically the stats should be denormalized for quick access where only filtering is required.
You may want to take a look at CQRS if you haven't done so.
Although most queries return an object or a collection of objects, it also fits within the concept to return some types of summary calculations, such as an object count, or a sum of a numerical attribute that was intended by the model to be tallied.
Eric Evans - Domain-Driven Design
This might be considered a read model. Are these daily sales objects being used in any domain model behaviour? Does any business logic depend on them? If not, it might be a good idea to separate this out into a distinct read model - at which point you're taking your first steps into CQRS.

Resources