DDD: Aggregate design - Referencing between aggregates - domain-driven-design

I have an issue with how to design aggregates.
I have Company, City, Province and Country entities. Each of these needs to be an aggregate root of its own aggregate. The City, Province and Country entities are used throughout the system and are referenced by many other entities, so they are not value objects and also need to be accessed in many different scenarios. So they should have repositories. A CityRepository would have methods such as FindById(int), GetAll(), GetByProvince(Province), GetByCountry(Country), GetByName(string).
Take the following example. A Company entity is associated with a City, which belong to a Province which belongs to a Country:
Now let's say we have a company listing page which lists some companies with their city, province and country.
Reference by ID
If an entity needs to reference a City, Province or Country, they would do so by ID (as suggested by Vaughn Vernon).
In order to get this data from the repositories, we need to call 4 different repositories and then match up the data in order to populate the view.
var companies = CompanyRepository.GetBySomeCriteria();
var cities = CityRepository.GetByIds(companies.Select(x => x.CityId);
var provinces = ProvinceRepository.GetByIds(cities.Select(x => x.ProvinceId);
var countries = CountryRepository.GetByIds(province.Select(x => x.CountryId);
foreach(var company in companies)
{
var city = cities.Single(x => x.CityId == company.CityId);
var province = provinces.Single(x => x.ProvinceId == city.ProvinceId);
var country = countries.Single(x => x.CountryId == province.CountryId);
someViewModel = new CompanyLineViewModel(company.Name, city.Name, province.Name, country.Name);
}
This is a very bulky and inefficient, but apparently the 'correct' way?
Reference by Reference
If the entites were referenced by reference, the same query would look like this:
var companies = CompanyRepository.GetBySomeCriteria();
someViewModel = new CompanyLineViewModel(company.Name, company.City.Name, company.Province.Name, company.Country.Name);
But as far as I understand, these entities cannot be referenced by reference as they exist in different aggregates.
Question
How else could I better design these aggregates?
Could I load company entities with the city model even when they exist in different aggregates? I imagine this would soon break the boundaries between aggregates. It would also create confusion when dealing with transactional consistency when updating aggregates.

You could create a completely different object (which would be just a flat data structure) that represents the view model and can be directly retrieved from the database. Google "Thin Read Layer" or "CQRS".

Dennis Traub has already pointed out what you can do to improve query performance. That approach is much more efficient for querying, but also even more bulky, because you now need additional code to keep your view model in sync with the aggregates.
If you don't like that approach or cannot use it for other reasons, I don't think that the first approach that you are suggesting is more ineffective or bulky than using direct object references. Suppose for a moment that you were using direct object references in the aggregates. How would you persist those aggregates to durable storage? The following options come to mind, when you are using a database:
If you are using a denormalized table for Company (e.g., with an document database such as MongoDB), you are effectively optimizing for a view query already. However, you'll need all the extra work to keep your Company table in sync with City, Province. Efficient, but bulky, and you might consider persisting the real view models instead (one per use-case).
If you are using normalized tables with a relational database, you would use foreign keys in the Company table to reference the respective City, Province etc. by their id. When querying for a Company, in order to retrieve the fields of City, Province etc that are needed to populate your view model, you can either use a JOIN over 4+ tables, or use 4 independent queries to the City, Province, ... tables (e.g., when using lazy loading for the foreign key references).
If you are using normalized tables in a non-relational database, usually people use application side joins exactly as in the code you suggested. For some databases, ORM tools such as Morphia or Datanucleus can save you some programming work, but under the hood, the independent queries remain.
Therefore, in the 2nd and 3rd option, you save a bit of trivial programming work if you let an ORM solution generate the database mapping for you, but you don't get much improved efficiency. (JOINs can be optimized by proper indices, but getting this done right is non-trivial).
However, I'd like to point out that you remain full control over the view model object construction and database queries when you are referencing by Id and using a programmatic application side joins as in the code that you suggested.
In particular, names of cities, provinces etc are usually changing very seldomly and there are only few of them and they easily fit into the memory. Hence you can make extensive use of in-memory caching for the database queries -- or even use in-memory-repositories that are populated from flat-files on application startup. When done right, to construct your view model for Company, only one database call to the Company table is required, and the other fields are retrieved from the in-memory cache/repository, which I would consider extremely efficient.

Related

Reuse same database tables in different repositories (repositories overlap on the data they access)

Suppose I have database tables Customer, Order, Item. I have OrderRepository that accesses, directly with SQL/my ORM, both the Order and Items table. E.g. I could have a method, getItems on the OrderRespositry that returns all items of that order.
Suppose I now also create ItemRepository. Given I now have 2 repositories accessing the same database table, is that generally considered poor design? My thinking is, sometimes a user wants to update the details about an Item (e.g. name), but when using the OrdersRepository, it doesn't really make sense to not be able to access the items directly (you want to know about all the items in an order)
Of course, the OrderRepository could internally create* an ItemRepository and call methods like getItemsById(ids: string[]). However, consider the case that I want to get all orders and items ever purchased by a Customer. Assuming you had the orderIds for a customer, you could have a getOrders(ids: string[]) on the OrderRepository to fetch all the orders and then do a second query to fetch all the Items. I feel you make your life harder (and less efficient) in the sense you have to do the join to match items with orders in the app code rather than doing a join in SQL.
If it's not considered bad practice, is there some kind of limit to how much overlap Repositories should have with each other. I've spent a while trying to search for this on the web, but it seems all the tutorials/blogs/vdieos really don't go further than 1 table per entity (which may be an anti-pattern).
Or am I missing a trick?
Thanks
FYI: using express with TypeScript (not C#)
is a repository creating another repository considered acceptable. shouldn't only the service layer do that?
It's difficult to separate the Database Model from the DDD design but you have to.
In your example:
GetItems should have this signature - OrderRepostiory.GetItems(Ids: int[]) : ItemEntity. Note that this method returns an Entity (not a DAO from your ORM). To get the ItemEntity, the method might pull information from several DAOs (tables, through your ORM) but it should only pull what it needs for the entity's hydration.
Say you want to update an item's name using the ItemRepository, your signature for that could look like ItemRepository.rename(Id: int, name: string) : void. When this method does it's work, it could change the same table as the GetItems above but note that it could also change other tables as well (For example, it could add an audit of the change to an AuditTable).
DDD gives you the ability to use different tables for different Contexts if you want. It gives you enough flexibility to make really bold choices when it comes the infrastructure that surrounds your domain. So ultimately, it's a matter of what makes sense for your specific situation and team. Some teams would apply CQRS and the GETOrder and Rename methods will look completely different under the covers.

Create Mongoose Schema Dynamically for e-commerce website in Node

I would like to ask a question about a possible solution for an e-commerce database design in terms of scalability and flexibility.
We are going to use MongoDB and Node on the backend.
I included an image for you to see what we have so far. We currently have a Products table that can be used to add a product into the system. The interesting part is that we would like to be able to add different types of products to the system with varying attributes.
For example, in the admin management page, we could select a Clothes item where we should fill out a form with fields such as Height, Length, Size ... etc. The question is how could we model this way of structure in the database design?
What we were thinking of was creating tables such as ClothesProduct and many more and respectively connect the Products table to one of these. But we could have 100 different tables for the varying product types. We would like to add a product type dynamically from the admin management. Is this possible in Mongoose? Because creating all possible fields in the Products table is not efficient and it would hit us hard for the long-term.
Database design snippet
Maybe we should just create separate tables for each unique product type and from the front-end, we would select one of them to display the correct form?
Could you please share your thoughts?
Thank you!
We've got a mongoose backend that I've been working on since its inception about 3 years ago. Here some of my lessons:
Mongodb is noSQL: By linking all these objects by ID, it becomes very painful to find all products of "Shop A": You would have to make many queries before getting the list of products for a particular shop (shop -> brand category -> subCategory -> product). Consider nesting certain objects in other objects (e.g. subcategories inside categories, as they are semantically the same). This will save immense loading times.
Dynamically created product fields: We built a (now) big module that allows user to create their own databse keys & values, and assign them to different objects. In essence, it looks something like this:
SpecialFieldModel: new Schema({
...,
key: String,
value: String,
...,
})
this way, you users can "make their own products"
Number of products: Mongodb queries can handle huge dataloads, so I wouldn't worry too much about some tables beings thousands of objects large. However, if you want large reports on all the data, you will need to make sure your IDs are in the right place. Then you can use the Aggregation framework to construct big queries that might have to tie together multiple collectons in the db, and fetch the data in an efficient manner.
Don't reference IDs in both directions, unless you don't know what you're doing: Saving a reference to category ID in subcatgories and vice-versa is incredibly confusing. Which field do you have to update if you want to switch subcategories? One or the other? Or both? Even with strong tests, it can be very confusing for new developers to understand "which direction the queries are running in" (if you are building a proudct that might have to be extended in the future). We've done both which has led to a few problems. However, those modules that saved references to upper objects (rather than lower ones), I found to be consistently more pleasant and simple to work with.
created/updatedAt: Consider adding these fields to every single model & Schema. This will help with debugging, extensibility, and general features that you will be able to build in the future, which might otherwise be impossible. (ProductSchema.set('timestamps', true);)
Take my advice with a grain of salt, as I haven't designed most of our modules. But these are the sorts of things I consider as continue working on our applications.

is Logical to consider city or province as a new model in database design?

I am designing a new web application for our Business. Now I don't know what is better? I use MongoDB as database.
We have about 10 MongoDB model (schema) such as Leads & Contractors.
The Leads and Contractor Model must have a filed for city and a field as province.
We Want to show the leads of city 'A' for the contractors they work in city 'A' ;
As I explained leads and contractor model(schema) must have a filed for city.
I want to know what design is best practice and why?
Consider city and province as a filed in leads and contractors( a city field).(embed design).
Consider contractors and leads model as normalized. And create a new model(schema) for city and reference it in leads and contractors model
(reference design).
What is better and why?
I can see no benefit in this case of creating a separate collection for the new fields. The big advantage of a NoSQL database is the ability to embed data in this way and rapidly query against it.
Imagine you want to query a contractor. If you did it with references, your database would have to first fetch the Contractor document, then go searching for the City document using the reference provided. Embedding removes the need for this, leading to much faster read times. The only time you might want to consider referencing would be if cities had their own central role in your application, in the way that a "BlogPost" document's author field might have a reference to a "User" document. But for simple address data like city and province I'm assuming this isn't the case.
Take a look at this link https://docs.mongodb.com/manual/core/data-model-design/ for a solid explanation of when to embed and when to split data into separate collections with references.

homogeneous vs heterogeneous in documentdb

I am using Azure DocumentDB and all my experience in NoSql has been in MongoDb. I looked at the pricing model and the cost is per collection. In MongoDb I would have created 3 collections for what I was using: Users, Firms, and Emails. I noted that this approach would cost $24 per collection per month.
I was told by the people I work with that I'm doing it wrong. I should have all three of those things stored in a single collection with a field to describe what the data type is. That each collection should be related by date or geographic area so one part of the world has a smaller portion to search.
and to:
"Combine different types of documents into a single collection and add
a field across all to separate them in searching like a type field or
something"
I would never have dreamed of doing that in Mongo, as it would make indexing, shard keys, and other things hard to get right.
There might not be may fields that overlap between the objects (example: Email and firm objects)
I can do it this way, but I can't seem to find a single example of anyone else doing it that way - which indicates to me that maybe it isn't right. Now, I don't need an example, but can someone point me to some location that describes which is the 'right' way to do it? Or, if you do create a single collection for all data - other than Azure's pricing model, what are the advantages / disadvantages in doing that?
Any good articles on DocumentDb schema design?
Yes. In order to leverage CosmosDb to it's full potential need to think of a Collection is an entire Database system and not as a "table" designed to hold only one type of object.
Sharding in Cosmos is exceedingly simply. You just specify a field that all of your documents will populate and select that as your partition key. If you just select a generic value such as key or partitionKey you can easily separate the storage of your inbound emails, from users, from anything else by picking appropriate values.
class InboundEmail
{
public string Key {get; set;} = "EmailsPartition";
// other properties
}
class User
{
public string Key {get; set;} = "UsersPartition";
// other properties
}
What I'm showing is still only an example though. In reality your partition key values should be even more dynamic. It's important to understand that queries against a known partition are extremely quick. As soon as you need to scan across multiple partitions you'll see much slower and more costly results.
So, in an app that ingests a lot of user data. Keeping a single user's activity together in one partition might make sense for that particular entity.
If you want evidence that this is the appropriate way to use CosmosDb, consider the addition of the new Gremlin Graph APIs. Graphs are inherently heterogenous as they contain many different entities and entity types as well as the relationships between them. The query boundary of Cosmos is at the collection level so if you tried putting your entities all in different collections none of the Graph API or queries would work.
EDIT:
I noticed in the comments you made this statement And you would have an index on every field in both objects. CosmosDb does automatically index every field of every document. They use a special proprietary path based indexing mechanism that ensures every path of your JSON tree has indices on it. You have to specifically opt out of this auto indexing feature.

How to perform intersection operation on two datasets in Key-Value store?

Let's say I have 2 datasets, one for rules, and the other for values.
I need to filter the values based on rules.
I am using a Key-Value store (couchbase, cassandra etc.). I can use multi-get to retrieve all the values from one table, and all rules for the other one, and perform validation in a loop.
However I find this is very inefficient. I move massive volume of data (values) over the network, and the client busy working on filtering.
What is the common pattern for finding the intersection between two tables with Key-Value store?
The idea behind the nosql data model is to write data in a denormalized way so that a table can answer to a precise query. To make an example imagine you have reviews made by customers on shops. You need to know the reviews made by a user on shops and also reviews received by a shop. This would be modeled using two tables
ShopReviews
UserReviews
In the first table you query by shop id in the second by user id but data are written twice and accessed directly using just a key access.
In the same way you should organize values by rules (can't be more precise without knowing what's the relation between them) and so on. One more consideration: newer versions of nosql db supports collections which might help to model 1 to many relations.
HTH, Carlo

Resources