Related
I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.
Assume read model ProductCatalogueItem is built from aggregates/write-models, stored separately from write-models, and contains each product available for selling, and has following properties:
basics: product_code, name, price, number_of_available_stock,
documentation: short_description, description,...
product characteristics: weight, length, depth, width, color,...
And, there are two views:
product list containing list/table/grid of available product offers, and the view needs only following basic properties: product_code, name, price, number_of_available_stock,
product details showing all the properties - basics, documentation, product characteristics.
Naturally, there come two ViewModels in mind:
ProductCatalogueListItem containing only basic properties,
ProductCatalogueItemDetails containing all the properties.
Now,.. there two options (I can see).
ViewModels are 1:1 representation of ReadModels
Therefore the are two read models, not one, ProductCatalogueListItem and ProductCatalogueItemDetails. And, the read service will have two methods:
List<ProductCatalogueListItem> searchProducts(FilteringOptions),
ProductCatalogueItemDetails getProductDetails(product_code).
And, controllers return these models directly (or, mapped to dto for transport layer).
The issue here is filtering,.. should read service perform search query on a different read model, than is returned from the method call? Because, ProductCatalogueListItem doesn't have enough information to perform filtering.
ViewModels are another project of ReadModels
The read service will have two methods:
List<ProductCatalogueItem> searchProducts(FilteringOptions),
ProductCatalogueItem getProduct(product_code).
And, the mapping from ReadModels to ViewModels is done by upper layer (probably controller).
There is no issue with filtering,... But, there is another issue, that more data leave domain layer, than is actually needed. And, controllers would grow with more logic. As there might be different controllers for different transport technologies, then mapping code would probably get duplicated in those controllers.
Which approach to organize responsibilities is correct according to DDD/CQRS, or completely something else?
The point is:
should I build two read models, and search using one, then return other?
should I build single read model, which is used, and then mapped to limited view to contain only base information for view?
First of all, you do a wrong assertion:
...read model ProductCatalogueItem is built from aggregates/write-models...
Read model doesn't know of aggregates or anything about write model, you build the read model directly from the database, returning the data needed by the UI.
So, the view model is the read model, and it doesn't touch the write model. That's the reason why CQRS exists: for having a different model, the read model, to optimize the queries for returning the data needed by the client.
Update
I will try to explain myself better:
CQRS is simply splitting one object into two, based on the method types. There are two method types: command (any method that mutates state) and query (any method that returns a value). That's all.
When you apply this pattern to the service boundary of an application, you have a write service and a read service, and so you can scale differently the command and query handling, and you can have also two models.
But CQRS is not having two databases, is not messaging, is not eventual consistency, is not updating read model from write model, is not event sourcing. You can do CQRS wihtout them. I say this because I've seen some misconceptions in your assertions.
That said, the design of the read model is done according to what information the user wants to see in the UI, i.e., the read model is the view model, you have no mapping between them, they both are the same model. You can read about it in the references (3) and (6) bellow. I think this answer to your whole question. What I don't understand is the filtering issue.
Some good references
(1) http://codebetter.com/gregyoung/2010/02/16/cqrs-task-based-uis-event-sourcing-agh/
(2) http://www.cqrs.nu/Faq/command-query-responsibility-segregation
(3) "Implementing Domain Driven Design" book, by Vaughn Vernon. Chapter 4: Architecture, "Command-Query Responsibility Segregation, or CQRS" section
(4) https://kalele.io/really-simple-cqrs/
(5) https://martinfowler.com/bliki/CQRS.html
(6) http://udidahan.com/2009/12/09/clarified-cqrs/
As you already built your read model using data which arrived from one or more services, your problem is now in another space(perhaps MVC) rather in CQRS.
Now assume your read model is a db object and ProductCatalogueListItem and ProductCatalogueItemDetails are 2 view models. When you have a request to serve list of products you will make a query in your read db from read model (ProductCatalog table). May be you make queries for additional filters using additional where clauses. Now where do you put your mapping activities in your code after fetching db objects? Its a personal choice. You don't have to do it on uupper llayer aat aall. When I use dapper I fetch db objects using view models inside generic. So I can directly return result from my service method whose return type would be IEnumerable.
For a detail view I would use the same db object. I know CQRS suggests to have different read models for different views. But question yourself - do you really need another db object for detail view? You will need only an id to get all columns where in the first case you needed some selected columns. So I would design your case with a mixture of your 2 above mentioned methods - have 2 service methods returning 2 different objects but instead of having a 1:1 read model to view model have a single read db object and build 2 different view models from it.
When I try run a query to read all entities of a kind in a transaction with google datastore it gives me this error
{ Error: Only ancestor queries are allowed inside transactions.
at /root/src/node_modules/grpc/src/client.js:554:15
code: 3,
metadata: Metadata { _internal_repr: {} },
So I need to use an ancestor query. How do I create an ancestor query? It appears to depend on how you structured the hierarchy in datastore. So my next question is, given every entity I have created in datastore has been saved like so (the identifier is unique to the entityData saved)
const entityKey = datastore.key({ namespace: ns, path: [kind, identifier] });
{ key: entityKey, method: 'upsert', data: entityData };
How do I read from the db within a transaction? I think I could do it if I knew the identifiers, but the identifiers are constructed from the entityData that I saved in the kind and I need to read the entities of the kind to figure out what I have in the db (chicken egg problem). I am hoping I am missing something.
More context
The domain of my problem involves sponsoring people. I have stored a kind people in datastore where each entity is a person consisting of a unique identifier, name and grade. I have another kind called relationships where each entity is a relationship containing two of the peoples identifiers, the sponsor & sponsee (linking to people together). So I have structured it like an RDB. If I want to get a persons sponsor, I get all the relationships from the db, loop over them returning the relationships where the person is the sponsee then query the db for the sponsor of that relationship.
How do I structure it the 'datastore' way, with entity groups/ancestors, given I have to model people and their links/relationships.
Let's assume a RDB is out of the question.
Example scenario
Two people have to be deleted from the app/db (let's say they left the company on the same day). When I delete someone, I also want to remove their relationships. The two people I delete share a relationship (one is sponsoring the other). Assume the first transaction is successful i.e. I delete one person and their relationship. Next transaction, I delete one person, then search the relationships for relevant relationships and I find one that has already been deleted because eventually consistent. I try find the person for that relationship and they don't exist. Blows up.
Note: each transaction wraps delete person & their relationship. Multiple people equals multiple transactions.
Scalability is not a concern for my application
Your understanding is correct:
you can't use an ancestor query since your entities are not in an ancestry relationship (i.e. not in the same entity group).
you can't perform non-ancestor queries inside transactions. Note that you also can't read more than 25 of your entities inside a single transaction (each entity is in a separate entity group). From Restrictions on queries:
Queries inside transactions must be ancestor queries
Cloud Datastore transactions operate on entities belonging to up
to 25 entity groups, but queries inside transactions must be
ancestor queries. All queries performed within a transaction must
specify an ancestor. For more information, refer to Datastore
Transactions.
The typical approach in a context similar to yours is to perform queries outside transactions, often just keys only queries - to obtain the entity keys, then read the corresponding entities (up to 25 at a time) by key lookup inside transactions. And use transactions only when it's absolutely needed, see, for example, this related discussion: Ancestor relation in datastore.
Your question apparently suggests you're approaching the datastore with a relational DB mindset. If your app fundamentally needs relational data (you didn't describe what you're trying to do) the datastore might not be the best product for it. See Choosing a storage option. I'm not saying that you can't use the datastore with relational data, it can still be done in many cases, but with a bit more careful design - those restrictions are driving towards scalable datastore-based apps (IMHO potentially much more scalable that you can achieve with relational DBs)
There is a difference between structuring the data RDB style (which is OK with the datastore) and using it in RDB style (which is not that good).
In the particular usage scenario you mentioned you do not need to query for the sponsor of a relationship: you already have the sponsor's key in the relationship entity, all you need to do is look it up by key, which can be done in a transaction.
Getting all relationship entities for a person needs a query, filtered by the person being the sponsor or the sponsee. But does it really have to be done in a transaction? Or is it acceptable if maybe you miss in the result list a relationship created just seconds ago? Or having one which was recently deleted? It will eventually (dis)appear in the list if you repeat the query a bit later (see Eventual Consistency on Reading an Index). If that's acceptable (IMHO it is, relationships don't change that often, chances of querying exactly right after a change are rather slim) then you don't need to make the query inside a transaction thus you don't need an ancestry relationship between the people and relationship entities. Great for scalability.
Another consideration: looping through the list of relationship entities: also doesn't necessarily have to be done in a transaction. And, if the number of relationships is large, the loop can hit the request deadline. A more scalable approach is to use query cursors and split the work across multiple tasks/requests, each handling a subset of the list. See a Python example of such approach: How to delete all the entries from google datastore?
For each person deletion case:
add something like a being_deleted property (in a transaction) to that person to flag the deletion and prevent any use during deletion, like creating new relationship while the deletion task is progressing. Add checks for this flag wherever needed in the app's logic (also in transactions).
get the list of all relationship keys for that person and delete them, using the looping technique mentioned above
in the last loop iteration, when there are no relationships left, enqueue another task, generously delayed, to re-check for any recent relationships that might have been missed in the previous loop execution due to the eventual consistency. If any shows up re-run the loop, otherwise just delete the person
If scalability is not a concern, you can also re-design you data structures to use ancestry between all your entities (placing them in the same entity group) and then you could do what you want. See, for example, What would be the purpose of putting all datastore entities in a single group?. But there are many potential risks to be aware of, for example:
max rate of 1 write/sec across the entire entity group (up to 500 entities each), see Datastore: Multiple writes against an entity group inside a transaction exceeds write limit?
large transactions taking too long and hitting the request deadlines, see Dealing with DeadlineExceededErrors
higher risk of contention, see Contention problems in Google App Engine
I am redesigning my NodeJS application because I want to use the Rich Domain Model concept. Currently I am using Anemic Domain Model and this is not scaling well, I just see 'ifs' everywhere.
I have read a bunch of blog posts and DDD related blogs, but there is something that I simply cannot understand... How do we handle Persistence properly.
To start, I would like to describe the layers that I have defined and their purpose:
Persistence Model
Defines the Table Models. Defines the Table name, Columns, Keys and Relations
I am using Sequelize as ORM, so the Models defined with Sequelize are considered my Persistence Model
Domain Model
Entities and Behaviors. Objects that correspond to the abstractions created as part of the Business Domain
I have created several classes and the best thing here is that I can benefit from hierarchy to solve all problems (without loads of ifs yay).
Data Access Object (DAO)
Responsible for the Data management and conversion of entries of the Persistence Model to entities of the Domain Model. All persistence related activities belong to this layer
In my case DAOs work on top of the Sequelize models created on the Persistence Model, however, I am serializing the records returned on Database Interactions in different objects based on their properties. Eg.: If I have a Table with a column called 'UserType' that contains two values [ADMIN,USER], when I select entries on this table, I would serialize the return according to the User Type, so a User with Type: ADMIN would be an instance of the AdminUser class where a User with type: USER would simply be a DefaultUser...
Service Layer
Responsible for all Generic Business Logic, such as Utilities and other Services that are not part of the behavior of any of the Domain Objects
Client Layer
Any Consumer class that plays around with the Objects and is responsible in triggering the Persistence
Now the confusion starts when I implement the Client Layer...
Let's say I am implementing a new REST API:
POST: .../api/CreateOrderForUser/
{
items: [{
productId: 1,
quantity: 4
},{
productId: 3,
quantity: 2
}]
}
On my handler function I would have something like:
function(oReq){
var oRequestBody = oReq.body;
var oCurrentUser = oReq.user; //This is already a Domain Object
var aOrderItems = oRequestBody.map(function(mOrderData){
return new OrderItem(mOrderData); //Constructor sets the properties internally
});
var oOrder = new Order({
items: aOrderItems
});
oCurrentUser.addOrder(oOrder);
// So far so good... But how do I persist whatever
// happened above? Should I call each DAO for each entity
// created? Like, first create the Order, then create the
// Items, then update the User?
}
One way I found to make it work is to merge the Persistence Model and the Domain Model, which means that oCurrentUser.addOrder(...) would execute the business logic required and would call the OrderDAO to persist the Order along with the Items in the end. The bad thing about this is that now the addOrder also have to handle transactions, because I don't want to add the order without the items, or update the User without the Order.
So, what I am missing here?
Aggregates.
This is the missing piece on the story.
In your example, there would likely not be a separate table for the order items (and no relations, no foreign keys...). Items here seem to be values (describing an entity, ie: "45 USD"), and not entities (things that change in time and we track, ie: A bank account). So you would not directly persist OrderItems but instead, persist only the Order (with the items in it).
The piece of code I would expect to find in place of your comment could look like orderRepository.save(oOrder);. Additionally, I would expect the user to be a weak reference (by id only) in the order, and not orders contained in a user as your oCurrentUser.addOrder(oOrder); code suggests.
Moreover, the layers you describe make sense, but in your example you mix delivery concerns (concepts like request, response...) with domain concepts (adding items to a new order), I would suggest that you take a look at established patterns to keep these concerns decoupled, such as Hexagonal Architecture. This is especially important for unit testing, as your "client code" will likely be the test instead of the handler function. The retrieve/create - do something - save code would normally be a function in an Application Service describing your use case.
Vaughn Vernon's "Implementing Domain-Driven Design" is a good book on DDD that would definitely shed more light on the topic.
I am developing a sails.js app with sequelize ORM. I am a little confused as to when BelongsTo and HasOne need to be used.
The documentation states that :
BelongsTo associations are associations where the foreign key for the
one-to-one relation exists on the source model.
HasOne associations are associations where the foreign key for the
one-to-one relation exists on the target model.
Is there any other difference apart from the the place where these are specified? Does the behavior still continue to be the same in either cases?
This is more universal problem.
The main difference is in semantic. you have to decide what is the relationship (Some silly example):
Man has only one right arm. Right arm belongs to one man.
Saying it inversely looks a little weird:
Right arm has a man. A man belongs to right arm.
You can have man without right arm. But alone right arm is useless.
In sequelize if RightArm and Man are models, it may looks like:
Man.hasOne(RightArm); // ManId in RigthArm
RightArm.belongsTo(Man); // ManId in RigthArm
And as you notice there is also difference in db table structure:
BelongsTo will add the foreignKey on the source where hasOne will add on the target (Sequelize creates new column 'ManId' in table 'RightArm' , but doesn't create 'RightArmId' column in 'Man' table).
I don't see any more differences.
I agree with Krzysztof Sztompka about the difference between:
Man.hasOne(RightArm);
RightArm.belongsTo(Man);
I'd like to answer Yangjun Wang's question:
So in this case, should I use either Man.hasOne(RightArm); or
RightArm.belongsTo(Man);? Or use them both?
It is true that the Man.hasOne(RightArm); relation and the RightArm.belongsTo(Man); one do the same thing - each of these relations will add the foreign key manId to the RightArm table.
From the perspective of the physical database layer, these methods do the same thing, and it makes no difference for our database which exact method we will use.
So, what's the difference? The main difference lays on the ORM's layer (in our case it is Sequalize ORM, but the logic below applies to Laravel's Eloquent ORM or even to Ruby's Active Record ORM).
Using the Man.hasOne(RightArm); relation, we will be able to populate the man's RightArm using the Man model. If this is enough for our application, we can stop with it and do not add the RightArm.belongsTo(Man); relation to the RightArm model.
But what if we need to get the RightArm's owner? We won't be able to do this using the RightArm model without defining the RightArm.belongsTo(Man); relation on the RightArm model.
One more example will be the User and the Phone models. Defining the User.hasOne(Phone) relation, we will be able to populate our User's Phone. Without defining the Phone.belongsTo(User) relation, we won't be able to populate our Phone's owner (e.g. our User). If we define the Phone.belongsTo(User) relation, we will be able to get our Phone's owner.
So, here we have the main difference: if we want to be able to populate data from both models, we need to define the relations (hasOne and belongsTo) on both of them. If it is enough for us to get only, for example, User's Phone, but not Phone's User, we can define only User.hasOne(Phone) relation on the User model.
The logic above applies to all the ORMs that have hasOne and belongsTo relations.
I hope this clarifies your understanding.
I know this is a 4-years late answer, but I've been thinking of it, searching the docs, and googling since yesterday. And couldn't find an answer that convinced me about what was happening. Today I've got to a conclusion: the difference is not just a matter of semantics, definitely!
Let's suppose you have the following statement (from the docs):
Project.hasMany(Task);
It creates, in Project model, some utility methods on the instances of Project, like: addTask, setTask etc. So you could do something like:
const project = await Project.create({...});
// Here, addTask exists in project instance as a
// consequence of Project.hasMany(Task); statement
project.addTasks([task1, task2]);
Also, in the database, a foreign key in tasks relation would've been created, pointing to projects relation.
Now if, instead of Project.hasMany(Task);, I had stated only:
Task.belongsTo(Project);
Then, similarly, in the database, foreign keys in tasks relation would've been created, pointing to projects relation. But there wouldn't be any addTasks method on project instances though. But, by doing Task.belongsTo(Project);, Sequelize would create a different set of methods, but only on task instances this time. After doing that, you could associate a task to a project using, for example:
const proj = await Project.findByPk(...);
const task1 = await Task.create({...});
...
// Here, setProject exists in task instance as a
// consequence of Task.belongsTo(Project); statement
task1.setProject(proj);
The docs defines as source, the model that owns the method used to create the association. So, in:
Project.hasMany(Task);: In this statement, Project is the source model. Task is, in turn, the target model.
Task.belongsTo(Project);: In this statement, Task is the source model. Project is, in turn, the target model.
The thing is that, when creating associations using hasOne, hasMany, belongsTo, and belongsToMany, the instances utility methods are created only on the source model. In summary: if you want to have the utility methods created both in Project and Task instances, you must use the two statements for describing the same the association. In the database itself, both will have the same redundant effect (creating a foreign key on tasks relation pointing to projects relation's primary key):
// All the instances of Project model will have utility methods
Project.hasMany(Task);
// All the instances of Task model will have utility methods
Task.belongsTo(Project);
const project = await Project.create(...);
const task1 = await Task.create(...);
const task2 = await Task.create(...);
...
// as a consequence of Project.hasMany(Task), this can be done:
project.addTask(task1);
...
// as a consequence of Task.belongsTo(Project), this can be done:
task2.setProject(project);
BTW, after writing this answer, I realized that this is the same thing that Vladsyslav Turak is explaining in his answer, but I decided to keep my answer here because it adds some important practical information involving the utility methods stuff.
One-to-One belongTo or hasOne
Using the right arm example, and Sequelize's own documentation. The question we must ask is, can a man survive without a right arm? Or can a right arm survive without a man? To determine where we want our foreign key to exist is to answer this question. Let's take a more practical example.
Let's say you have a community website. Your users are all represented by a singular Profile model (or User model). But in a community you will also have administrators and moderators, both with their own sets of rights, and maybe even a different kind of profile. Instead of adding admin/mod specific fields to the User model, it might be best to create a separate model to represent an admin/mod.
Here's what basic user model looks like (ignoring constraints and validations):
class User extends Model {
static associate(models) {}
}
User.init(
{
username: DataTypes.STRING(25),
password: DataTypes.STRING(50)
}
)
Now here's a model that represents an admin or mod, which is intended to extend the user model:
class Staff extends Model {
static associate(models) {}
{
Staff.init(
{
permissions: DataTypes.ARRAY(DataTypes.STRING),
roleType: DataTypes.STRING(20),
}
)
So we ask our selves, can a user exist without admin/mod? Can an admin/mod exist without a user? A user doesn't have to be staff to use your services, but an admin/mod still needs a username and password in order to login. You could add those fields to the Staff model, but the truth is, it would be repeating information and make things harder to keep track of.
At the heart, an admin/mod would have the same attributes as a normal user, just with special abilities. If you intend otherwise, I'd still maintain a BaseUser model to organize and keep what each model has in common together. An admin/mod account would still have a username and password, and likely an email as well. Otherwise, you'd end up having two users with the same info, and in a community that can be confusing, and difficult to manage.
It is determined that a user does not need a Staff object associated with it to exist, so we shouldn't put the foreign key on the user profile. This still doesn't quite answer our question though. Remember, hasOne() puts the FK on the target model, while belongsTo() places the FK on the source. So we could say that Staff.belongsTo(User) or User.hasOne(Staff) that meets the requirement of the FK has to exist on the Staff model.
Whether you put a belongsTo() on the Staff model, or a hasOne() on the User model is a matter of semantics, and doesn't really matter. Either will associate the Staff model with the User model, allowing you to perform the User.getStaff() method. If you want to be able to get user account from a Staff instance, you could add a reference column without creating an actual association like so on our Staff model (this doesn't add constraints or associations, merely as it implies, a reference):
user: {
type: DataTypes.INTEGER,
references: {
model: User,
key: 'userId'
}
}
I hope this helps.