I have several modules in NestJS:
StudentModule
FinanesModule
GroupModule
ScoresModule
SubjectModule
A module includes model, repository, service, controller related to single resource or functionality.
Before crud operations, I need to check if we can update/delete some resource(student, group, etc.). For example, we cannot delete subject, while some students are studying it.
After some operations, I want to call some "side-effect" operations, for example:
When student is deleted, it should also be removed from all groups.
When a student is created in DB, an finances record should be also created for the student
when a student is added in a group, a score sheet document should be inserted in the DB
How can I organize project architecture, to avoid "wrong" dependencies?
If I call ScoresService.createEmptySheet(...args) method in StudentService.create() method, it would make StudentModule as a dependent of ScoresModule(as ScoresService is a part of ScoresModule) which seems to be wrong, as student module exists independently from finances module, actually, a finances are depended on student, not vice versa.
Do you think, it's normal to make StudentModule dependent on ScoresModule?!
In other words I need to organize communication between modules without make them dependent on each other when logically they aren't.
P.S I can't use SQL's foreign keys, I am using MongoDB.
I would say that this question points at something much broader than NestJs modules in particular and aims at Software Architecture as a whole.
To answer your question specifically your intuition is likely correct, making UserService depend upon ScoresService is not the best solution in a larger system.
Depending on the specific different solutions would be correct but it may be as simple as creating a third service UserScoreService which depends upon both ScoresService and UserService allowing them to remain independent of each other.
Speaking more broadly, if you are interested in knowing more about this topic I recommend checking out Robert C. Martin's Clean Architecture and Hexagonal Architecture.
You could look into Pub-Sub pattern, if you want that kind of decoupling (i.e. https://github.com/glebbash/nestjs-pubsub-core).
Related
I was asked to implement CQRS/Event sourcing patterns into a legacy web application, in order to prepare to migrate it from a monolithic/state oriented model to a distributed, service oriented app.
I have some questions on how I can design a Domain oriented code bundle that would connect the legacy entities strongly coupled to database, with a new Event sourced model.
The first things I did were:
writing a small "framework" for CQRS/ES, with classes like AggregateRoot, DomainEvent, Command, Handlers, Messaging, Eventstore, AggregateIds, etc.
trying to group and "migrate" the legacy Entities into some Aggregates to reconstruct all the history and states of the app into EventSoourced Aggregates
plug some Commands dispatching in the old controllers in order to let the app work as is, but also to feed the new CQRS/ES system on the side.
The context:
The legacy app contains several entities, mapped to database, that hold the model layer. (Our domain is Human resources (manpower).
Let's say we have those existing entities:
Worker, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
competences 1-N
Society, in which worker works, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
hours
Contract, with various fields and related entities (OneToOne, OneToMany), like
address 1-1
Worker 1-1
Society 1-1
documents 1-N
days 1-N
hours
etc.
From this legacy model, I designed a MissionAggregate that holds:
A db independent ID, like UUID
some Value objects: address, days (they were an entity in the legacy model, they became VOs here)
I also designed a WorkerAggregate and a SocietyAggregate, with fields and UUIDS, and in the MissionAggregate I added:
a reference to WorkerAggregate's UUID
a reference to SocietyAggregate's UUID
As I said earlier, my aim is to leave the legacy app as is, but just introduce in the CRUD controller's methods some calls to dispatch Commands to the new CQRS system.
For example:
After flushing newly created Contract in bdd, I want to dispatch a "CreateMissionCommand" to the new command bus.
It targets the appropriate Command Handler, that handles all the command's data, passes it to a newly created Aggregate with a new UUID and stores "MissionCreatedDomainEvent" in the EventStore.
The DomainEvent is indexed with an AggregateId, a playhead, and has a payload which contains the fields necessary to be applied to and build the MissionAggregate.
The newly Contract created in the app has now its former lifecycle, as usual, with all the updates that the legacy app does on it. But I also need to reflects all those changes to the corresponding EventSourcedAggregate, so every time there is a flush in database in the app, I dispatch a Command that translates the "crud like operations" of the legacy app into a Domain oriented /Command oriented pattern.
To sum up the workflow is:
A Crud legacy operation occurs and flushes some changes on the Contract Entity
In just a row of code in the controller, I dispatch a command built with necessary fields (AggregateId of the MissionAggregate... that I need to have stored somewhere... see next problems) to the Domain command bus, so that the impact on the existing code base is very low.
The bus passes the command to the corresponding command handler
The handler loads the aggregate and applies the changes it by calling the appropriate Aggregate method
then after some validation, the aggregate raises and stores the appropriate event
My problems and questions (some of them at least) are:
I feel like I am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that I have between the Entities, and with the same type of validations, checks etc.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that I have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't I have only references to Worker's entity id and Society's entity id?
How can I avoid having a eternally growing MissionAggregate? The Contract Entity is quite huge, it has a lot of fields that are constantly updated (hours, days, documents, etc.) If I want to store all those events, I need to have a large MissionAggregate to reflect all those changes; and so I need to have a tons of CommandHandlers that react to all the Commands of add, update, etc. that I am going to dispatch from the legacy app.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when I want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation? In the Entity itself, in a AggregateId field? in the Aggregate, should I have a ContractId field? Or should I have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
What to do with the past? Should I migrate all the existing data through a script that generates Aggregates and events on all the historical data?
Thanks in advance for your time.
You have a huge task ahead of you, let's try to break it down.
It's best to build this new part of the system in isolation from the legacy codebase, otherwise you're going to have your hands tied in every turn of the way.
Create a separate layer in your project for these new requirements. We're going to call it "bubble" from now on. This bubble will be like a greenfield project, with its own structure, dependencies, etc. There will be no direct communication between the bubble and the legacy; communication will happen through another dedicated translation layer, which we'll call "Anti-Corruption Layer" (ACL).
ACL
It is like an API between two systems.
It translates calls from the bubble to the legacy and vice-versa. Its purpose is to prevent one system from corrupting or influencing the other. This way you can keep building/maintaining each system independently from each other.
At the same time, the ACL allows one system to consume the other, and reuse logic, validations, rules, etc.
To answer your questions directly:
I feel like i am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that i have between the Entities, and with the same type of validations, checks etc.
With the ACL, you can resort to calling validations and reuse implementations from the legacy code. This will allow you time to rewrite things as needed or as possible.
You may not need to rewrite the entire system, though. If your goal is to implement CQRS and Event Sourcing and you can achieve this goal by keeping most or part of the legacy system, I would say you do it. Unless, of course, one of the goals is to completely replace the old system. Otherwise, keep it; write as less code as possible.
Suggested workflow:
Keep the CQRS and Event Sourcing system in the bubble
Do not bring these new frameworks into legacy
Make the lagacy Controller issue method calls to the ACL
The ACL will convert these calls into Commands and dispatch them
Any events will be caught by your Event Sourcing framework
Results will be persisted to the bubble's database
The bubble's database can be a different schema in the same database or can be a different database altogether. But you'll have to think about synchronization, and that's a topic of its own. To reduce complexity, I recommend a different schema in the same database.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that i have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't i have only references to Worker's entity id and Society's entity id?
How can i avoid having a eternally growing MissionAggregate ? The Contract Entity is quite huge, it has a looot of fields that are constantly updated (hours, days, documents, etc.) If i want to store all those events, i need to have a large MissionAggregate to reflect all those changes; and so i need to have a tons of CommandHandlers that react to all the Commands of add, update, etc that i am going to dispatch from the legacy app.
You should aim for small aggregates. Huge aggregates are likely to degrade performance and cause concurrency problems.
If you anticipate having a huge aggregate, it is best to rethink it and try to break it down. Ask what fields/properties change together - these are possibly a different aggregate.
Also, when you speak about CQRS, you generally lean towards a task-based way of doing things in your system.
Think of a traditional web application, where you have a huge page with lots of fields that are all sent to the server in one batch when the user saves.
Now, contrast it with a modern web app where the user changes small portions of data at each step. If you think about your system this way you'll find those smaller aggregates.
PS. you don't need to rebuild your interfaces for this. If your legacy system has those huge pages, you could have logic in the controllers to detect which fields were changed and issue the appropriate commands.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when i want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation ? In the Entity itself, in a AggregateId field ? in the Aggregate, should i have a ContratId field ? Or should i have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
Aggregates represent a conceptual whole. They are like atoms, indivisible things. You should always refer to an aggregate by its Root Entity Id, and never to a Child Entity Id: looking from the outside, there are no children.
An aggregate should be loaded as a whole and persisted as a whole. One more reason to have small aggregates.
An aggregate can be comprised of a single entity. Or it can have more entities and value objects, forming a graph, but one entity will be elected as the Root and will hold references to its children. Child entities and value objects should not hold references to their parents. The dependency is not bi-directional.
If Contract is an entity inside the Mission aggregate, the Contract should not have a reference to its parent.
But, if your Contract and Mission are different aggregates, then they can reference each other by their Ids.
What to do with the past? Should i migrate all the existing datas through a script that generates Aggregates and events on all the historical data?
That's a question for the business experts. Do they need it? If they don't, then don't implement it just for the sake of doing so. Every decision you make should be geared towards satisfying a business need and generating real value for it, considering the costs and tradeoffs.
Some people say that code is a liability, not an asset, and I aggre to some extent: every line of code you write needs to be tested and supported. Don't write any code that is not really necessary.
Also, have a look at this article about the Strangler Pattern, which shows how to migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.
If you have a chance, watch this course at Pluralsight (paid): Domain-Driven Design: Working with Legacy Projects. The author presents practical approaches for dealing with this kind of task.
I hope this has given you some insight.
I don't want to spoil your game. Everybody knows how cool it is to rewrite something from scratch. It's a challenge, it's fun, it's exciting. However...
migrate it from a monolithic/state oriented model to a distributed, service oriented app
CQRS/Event Sourcing won't solve any of your problems and it won't help you distribute the app in any reasonable way. If you just generate events on the CRUD operations you'll have a large tangled mess of dependencies between each part. Every part that needs data will have to call a couple of "services" (i.e. tables) to get it, than push data elsewhere, generate events1 that some other parts will react to. It will be a mess. Usually this is called a distributed monolith.
This is also the reason you already see problems with it. These problems won't go away, because you are essentially building the same system in the same way, but this time it'll be more complex.
Where to go from here
The very first thing is always: have a clear goal. You want a service oriented architecture you said. Why? Are there parts that need different scaling, different resources? Are they managed by different teams with different life-cycles? Etc.? Maybe you already have all this, I don't know, but if not, that's your first task.
Then. The parts you do want to pull out can't be just CRUD things. Those will not be independent, so whether your goal (see point above!) is scaling or different team, you won't reach your goal! To be independent you'll have to pull out the behavior with the data, and in a way that the service can operate on its own.
You can't just throw buzzwords at it and hope for the best. I'd suggest to just ignore all the hype and buzzwords and think about the goal you want to reach.
For example: I need a million workers to log their time in under 10 minutes total. So that means I need a "service" to enable worker to log their time with a web interface. So let's create that as a complete independent piece with its own database so it can be scaled to a 100 nodes when it needs to be. Export data to billing automatically every hour or so.
I'm breaking my system into (at least) two bounded-contexts: study-design and survey-planning.
There's a concept named "subject" (potential subject for interviewing) in the study-design context. We also maintain associations between subjects and populations in that domain.
Now, in the survey-planning, we also need (some) information about the subject (for example: for planning a visit, or even for anticipated selection of questionnaire, in case the population the subject belongs to is known beforehand).
So, I need that "subject" in both contexts.
What approach should I pick? Having a shared kernel, as explained in Eric Evans DDD book? I don't mind (at least for now) having the two contexts sharing the same database.
Or... should I go pure microservice? Meaning: those two can't / shouldn't share database..., and in that case I might have to go the mirroring / duplicating route through event passing: https://www.infoq.com/news/2014/11/sharing-data-bounded-contexts
Any thoughts on which one is better, for the above situation?
Thanks!
The context for microservices is distributed systems. In any other situation it would probably be overkill. Shared kernel will eventually split. That is usually the case. You may start from it. Nothing wrong with that. However, it will not stay there.
I recommend that you choose a event-driven solution, but not necessarily to use microservices. You could build an event-driven monolith in order to spend much less time on synchronizing the two models. When the application grows too big then you split the monolith into microservices. You could use CQRS to split event more the models into write and read. If you use event-sourcing things get even more interesting.
In my experience, with shared kernel, the models become god objects, one-size-fits-all kind of objects.
In my opinion, you have three entities:
study
survey
person
It is pretty intuitive to see that each of these is its own aggregate root. So then we are talking about inter-root relationships. In my experience, those are meaningful entities on their own, and cleanest and most future proof by far is to treat those relationships as independent aggregate roots.
The relationship between a study and a person is perhaps called TestSubject, and the relationship between a person and a survey could be called Interviewee or something similar. In another context, the person could be an employee for a company, and then the Employee would be its own aggregate root. Information that only relates to the relationship and not to the person or the study say, should be limited to this relationship specific aggregate root. This could for instance be the start date at which the subject started to take part in the test, and the end date (when he dropped out, if he or she dropped out prematurely, etc.)
As for storage, all aggregate roots should define their own separate repositories as interfaces and know only those interfaces, but the implementation of those interfaces is free to choose to use the same database or different ones, or even different kinds, local or distributed, etc. So this holds for these 'relational' aggregate roots as well. But you should almost force yourself to use different databases and preferably even different technologies (e.g. one EntityFramework, the other MongoDb) when you start with this, to force yourself to make sure your interfaces are properly defined and independent of implementation.
And yes, big fan of CQRS as well here, and Event/Command Sourcing as well. There are many light-weight implementations possible that allow you to upscale, but are very easy to get into and afford you almost completely linear (=predictable) complexity.
You can start with microservices that share a monolithic data source, but only use partial domain entities and value objects
For example take an order entity. It's obvious that order lines don't exist without order. So we have to get them with the help of OrderRepository(throw an order entity). Ok. But what about other things that are loosely coupled with order? Should the customer info be available only from CustomerRepo and bank requisites of the seller available from BankRequisitesRepo, etc.? If it is correct, we should pass all these repositories to our Create Factory method I think.
Yes. In general, each major entity (aggregate root in domain driven design terminology) should have their own repositories. Child entities *like order lines) will generally not need one.
And yes. Define each repository as a service then inject them where needed.
You can even design things such that there is no direct coupling between Order and Customer in terms of an actual database link. This in turn allows customers and orders to live in completely independent databases. Which may or may not be useful for your applications.
You correctly understood that aggregate roots's (AR) child entities shall not have their own repository, unless they are themselves AR's. Having repositories for non-ARs would leave your invariants unprotected.
However you must also understand that entities should usually not be clustered together for convenience or just because the business states that some entity has one or many some other entity.
I strongly recommend that you read Effective Aggregate Design by Vaughn Vernon and this other blog post that Vaughn kindly wrote for a question I asked.
One of the aggregate design rule of thumb stated in Effective Aggregate Design is that you should usually reference other aggregates by identity only.
Therefore, you greatly reduce the number of AR instances needed in other AR's creationnal process since new Order(customer, ...) could become new Order(customerId, ...).
If you still find the need to query other AR's in one AR's creationnal process, then there's nothing wrong in injecting repositories as dependencies, but you should not depend on more than you need (e.g. let the client resolve the real dependencies and pass them directly rather than passing in a strategy allowing to resolve a dependency).
Say I have an object that for the most part has necessary attributes, etc for two different apps because both apps have the need to use them. It's possible that 10% of the attributes won't' be used in one app. Is it better to share that object (and aggregate/bounded context as a shared kernel?) or duplicate the attributes and data that is stored? One app is for end users/activities and the other app is for the management of the users/activities.
We recently developed an application that involved several modules using a lot of common entities. When we developed them, we moved these common entities to a project called common-domain and then had all the dependent modules use it. It turned out to be a disaster.
While we initially found several attributes to be common, we found out that how we designed them for certain modules conflicted in how they were used for the others. Altering the entities in the common-domain to fit the needs of one module sometimes broke how they worked for the other modules. We did not use a test-driven approach and it made finding the resulting bugs tedious.
Learning from that mistake, we should have recognized and identified the bounded contexts and identified the entities and their associated attributes per bounded context. The "common" entity should have been defined per bounded context along with any attribures that are needed for that context. Some attributes will definitely be common but since they are separate bounded contexts, they have to be declared per entity per bounded context.
I'll go a little bit further to mention where an item can be shared. Each entity in a bounded context has its own repository. It is possible that the "common" entities share the same underlying database table. It is the responsibility of each BC's entity repository to retrieve the relevant columns in order to return the appropriate entity instance.
An entity is typically not shared between BCs. You may have another BC in play. You should have one BC that is the system of record for the entity. All other BCs should be downstream from it and contain only the identity and relevant bits of data. Typically one would employ an event-driven architecture to notify dependent systems of any relevant changes to the state of the entity in question.
It may also be that you are trying to split a single BC. Perhaps focus on the BC side of things as opposed to the technical/application side.
Hope that helps :)
After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?
CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.
DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.
Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);
I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)
Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.
The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.