I have a fairly simple domain, with around 7-8 major entities identified and these could be their own aggregate roots. But there is going to be a UI screen that is going to list union of all objects in the system, that would mean union of all aggregates.
One way I have in mind is to use composition, i.e a Metadata aggregate that all other aggregate roots refer to, this will be an independent entity. So for this screen I can query this aggregate, the fields that I move to this new aggregate are the common fields that needs to be displayed in my "All Objects" grid.
The other approach could be to have an application service method that builds the necessary list for "All objects" screen by querying the other repositories and merging the lists at the application layer and also handling paging etc.
I am uneasy with the first solution as I can see a UI use case influencing my
domain design but the db does the grunt work of handling paging, merging lists etc
and there are no joins all of these info gleaned out by a single, simple query.
The second solution, although looks neater, loses out on ease and performance.
Please advise.
In this case I would propose the use of read-models which are essentially value objects or DTOs used specifically for read scenarios. Use of read-models is a pattern of keeping your entities and ARs clean. As far as how the read-models are created, you have two options basically as you described. One is to have a one repository return a single read-model that fulfills the requirements of a given view. This would allow you to leverage the database for performance. Another option is to compose read-models from multiple repositories or services at the application service level or event at the presentation layer. This approach is more extensible in that data doesn't have to come from the same data source.
Related
I'm creating a selling platform. The core aggregate is called Announcement and it holds references to other aggregates such as Categories, User etc. I am using CQRS approach an event-sourcing solution as storage.
For performance reasons, I decided to store some important details about associated objects (Categories, User) inside the Announcement aggregate along with their ids. My reasoning behind it was that when filtering announcements, I want to simplify the access to those information as much as possible (reduce the number of database joins, allow fancy querying syntax). It was possible, because I included all the required information in the command, which creates an announcement. Generation of a detailed view of an announcement is based on information embedded inside the aggregate. Although it seemed reasonable at first, now I'm having second thoughts.
The considerations that made me think are:
I realized that I don't need transactional consistency on all the additional details (categories, seller details, etc.). There are no constraints that would force me to do what I did.
The event store that I'm using offers multistream projections. I'm wondering if that's the puzzle piece that should replace the redundant information in the Announcement aggregate.
Are the following steps a valid solution for the described problem?
Remove the duplicated information from the Announcement aggregate;
Use a domain event to notify other aggregates about creation of an Announcement;
Let other aggregates publish appropriate events in response to the AnnouncementCreated event; these events may contain additional information about associated objects;
Introduce a multistream projection, which will update itself in response to events from multiple aggregates and produce a complete view of the announcement;
Never design aggregates by thinking of how you will read data. That is against the purpose of CQRS. Aggregates are about commands and business rules not queries. Use events to gather data from multiple aggregates then project the data however you want without affecting your aggregates. This concept is called a "projection".
In general, the only reason to include data in a particular aggregate is if that data affects command validation or if there's some other consistency demand. if information about categories or users isn't qualifying under either reason, then it makes a lot of sense to remove it from the announcement aggregate.
I would probably consider modeling a "categorized and associated announcement" aggregate which is fed by domain events from announcement/category/user aggregates. This could be implemented via the multistream projection from your event store, but I think it's useful to keep that detail separate because there are other ways you could feed domain events from multiple aggregates as commands for a different aggregate (the command implicit in any event is "incorporate this event into your view of the world").
In the Guide/eBook: .NET Microservices: Architecture for Containerized .NET Applications (related to the eShopOnContainers) in the chapter "Designing the infrastructure persistence layer" (page 213) is explained in general how an aggregate root can perform CUD operations against a persistent data source.
Two important starting points are mentioned :
An aggregate is ignorant of methods of persistency and infrastructure following the Persistence Ignorance and the Infrastructure Ignorance principles (page 218). An aggregate is determined by the business and not by the infrastructure.
One should only define one repository per aggregate root to maintain transactional consistency between the objects within the aggregate (page 213)
Unfortunately, in all further examples that are mentioned the aggregate root and all underlying objects that fall under it are within one and the same persistent data source.
The pattern then is as follows:
A repository is created containing that aggregate
In this repository a Unit of Work is injected during creation. This Unit of Work contains methods such as SaveChangesAsync, SaveEntitiesAsync, Update
and so on.
In a command, the Unit of Work manages the transactions to
this one data source such as a database or similar.
I want to expand this pattern that the aggregate can write its data over 2 or more physical data sources depending on the underlying object type.
Starting from starting point 1, it is perfectly justified to have a root aggregate and its underlying object to be updated to different data sources depending on the type of underlying object. Examples mentioned are : a Database and an XML file, a database and a NOSQL 'database',a database and a service, a database and an IoT device. Because an aggregate must be ignorant to methods of persistence and infrastructure, to my opinion there is no need to argue about the design of the aggregate. I think nowhere in the book it is written that a aggregate root should persist within one data source.
At the same time, starting point 2 also seems perfectly justified. Because the complete set of objects within the aggregate root is edited, and the successful persistence of the entire package is coordinated from one repository and (preferably) from one Unit of Work.
The question is:
How deals Domain Driven Design if within the aggregate - depending on the type of the underlying object - it is hydrated over different data sources?
Should I use one custom Unit of Work and make the decision where to write to within this UoW ?
I'm aware of the next question , but having studied the code I think it only deals with inheritance of repositories that deal with different data sources, but still serving one data source at the time and that is not what I'm after.
I want to expand this pattern that the aggregate can write its data over 2 or more physical data sources depending on the underlying object type.
Why do you want to do that on purpose?
In most cases, the persistence implementation is chosen to serve the domain, rather than the other way around. So the happy path typically involves choosing a persistence solution that can record the state of the entire aggregate, and storing the entire thing within a single transaction.
So if you find yourself trying to store an aggregate in two different places, you should take a hard careful look at why.
One common answer is that you want to be able to query the aggregate state efficiently. cqrs is a common solution here - rather than persisting the aggregate in two different data stores, you persist it to one and replicate it to another. The queries can run very efficiently against the replica (although there is of course some additional latency between a change to the aggregate and the reflection of that change in the query results).
Another common answer is that you really have two aggregates that reference each other. Nothing wrong with storing two aggregates in different places. You may be better served by making the distinction between the two explicit in your code.
Dan Pritchett
Jimmy Bogard
How deals Domain Driven Design if within the aggregate - depending on the type of the underlying object - it is hydrated over different data sources?
Badly, just like everybody else.
I have Meeting objects that form the basis of a scheduling system, of which gridviews are used to display the important information. This is for the purpose of scheduling employees to meetings, and for employees to view what has been scheduled.
I have been trying to follow DDD principles, but I'm having difficulty knowing what to pass from my service layer down to presentation area of system. This is because the schedule can be LARGE, and actually consists of many different elements of the system. Eg. Client Name, Address, Case Info, Group,etc, all of which are needed for the meeting scheduler to make a decision.
In addition to this, the scheduler needs to change values within this schedule and pass it back up to the service layer (eg. assign employees from dropdowns, maybe change group, etc). So, the information isn't really "readonly" - it needs to be interacted with. ie. It's not just a report.
Our current approach is to populate a flattened "Schedule Object" from SQL, which is constructed from small parts of different domain objects. It's quite a complex query. When changes have been made, this is then passed back up to the service layer, and the service will retrieve the domain objects in question, and fire business methods on the domain objects using information from the DTOs.
My question is, is this the correct approach? ie. Continue to generate large custom objects from SQL, and then pass down from Service Layer to Presentation Layer objects that feel a lot like View Models?
UPDATE due to an answer
To give a idea of the amount entities / aggregates relationships involved. (this is an obfuscated examples, so relationships are the important things here)
Client is in one default group
Client has one open case but many closed
Cases have many Meetings
Meeting have many assigned Employees
Meeting have many reasons
Meeting can get scheduled to different groups
Employees can be associated with many groups.
The schedule need to loads all meetings in open cases that belong to patients who are in the same groups as the employee.
Scheduler can see Client Name, Client Address, Case Info, MeetingTime, MeetingType, MeetingReasons, scheduledGroup(s) (showstrail), Assigned Employees (also has hidden employee ids).
Editable fields are assign employee dropdowns and scheduled group.
Schedule may be up to two hundred rows.
DTO is coming down from WCF, so domain model is accessed above this service layer, and not below.
Domain model business calls leveraged by service based on DTO values passed back, and repositories deal with inserts/updates.
So, I suppose to update, is using a query to populate an object which contains all of the above acceptable to pass down as one merged DTO? And if not, how would you approach it? ( giving some example calls to service layer, and explaining a little bit about how you conceive the ORM fetching the data keeping in mind performance)
In the service layer and below, I would treat each entity (see aggregate roots in DDD) separate with respect to it's transactional boundary. I.e. even if you could update a client and a case in the same UI view, it would be best to transactionally modify the client and then modify the case. The more you try to modify in one transaction, the more you can conflict with other users.
Although your schedule is large and can contain lots of objects, the service layer should again deal with each entity (aggregate root) separately and then bundle them together into a new view model. Sadly, on brown-field projects, a lot of logic might be in the SQL and the massive multi-table joins might make this harder to refactor into more atomic queries that do exactly what is needed. The old-school data-centric view of 'do everything you can in the database' goes against everything DDD.
Because DDD is a collection of design ideas and patterns and not particularly a methodology or an architecture, it sounds that it might be too late to try shoe-horn your current application into a DDD application-centric design. It sounds as though your current app is very entrenched in the data-centric view.
If everything is currently being passed up through the layers in one monolithic chunk, it might be best to keep with this style and just expose these monolithic chunks to the people in the other team who wish to consume them, for use in their new app. You might be able to put some sort of view model caching in place (a bit like the caching view model element in CQRS).
In my personal opinion, data-centric, normalised data apps have had their day (they made sense in the 1970s when hard disk space was expensive) and all apps should be moving toward more modern practices. In reality, only when legacy systems are crawling on their knees, will stakeholders usually put up the cash to look for alternatives (usually after stuffing every last server with RAM). It might be possible or best to convince them to refactor small sections at a time.
Background
Udi Dahan suggests a fetching strategy as a useful pattern to use for data access. I agree.
The concept is to make roles explicit. For example I have an Aggregate Root - Customer. I want customer in several parts of my application - a list of customers to select from, a view of the customer's details, and I want a button to deactivate a customer.
It seems Udi would suggest an interface for each of these roles. So I have ICustomerInList with very basic details, ICustomerDetail which includes the latest 10 products purchased, and IDeactivateCustomer which has a method to deactivate the customer. Each interface exposes just enough of my Customer Aggregate Root to get the job done in each situation. My Customer Aggregate Root implements all these interfaces.
Now I want to implement a fetching strategy for each of these roles. Each strategy can load a different amount of data into my Aggregate Root because it will be behind an interface exposing only the bits of information needed.
The general method to implement this part is to ask a Service Locator or some other style of dependency injection. This code will take the interface you are wanting, for example ICustomerInList, and find a fetching strategy to load it (IStrategyForFetching<ICustomerInList>). This strategy is implemented by a class that knows to only load a Customer with the bits of information needed for the ICustomerInList interface.
So far so good.
Question
What you pass to the Service Locator, or the IStrategyForFetching<ICustomerInList>. All of the examples I see are only selecting one object by a known id. This case is easy, the calling code passes this id through and will get back the specific interface.
What if I want to search? Or I want page 2 of the list of customers? Now I want to pass in more terms that the Fetching Strategy needs.
Possible solutions
Some of the examples I've seen use a predicate - an expression that returns true or false if a particular Aggregate Root should be part of the result set. This works fine for conditions but what about getting back the first n customers and no more? Or getting page 2 of the search results? Or how the results are sorted?
My first reaction is to start adding generic parameters to my IStrategyForFetching<ICustomerInList> It now becomes IStrategyForFetching<TAggregateRoot, TStrategyForSelecting, TStrategyForOrdering>. This quickly becomes complex and ugly. It's further complicated by different repositories. Some repositories only supply data when using a particular strategy for selecting, some only certain types of ordering. I would like to have the flexibility to implement general repositories that can take sorting functions along with specialised repositories that only return Aggregate Roots sorted in a particular fashion.
It sounds like I should apply the same pattern used at the start - How do I make roles explicit? Should I implement a strategy for fetching X (Aggregate Root) using the payload Y (search / ordering parameters)?
Edit (2012-03-05)
This is all still valid if I'm not returning the Aggregate Root each time. If each interface is implemented by a different DTO I can still use IStrategyForFetching. This is why this pattern is powerful - what does the fetching and what is returned doesn't have to map in any way to the aggregate root.
I've ended up using IStrategyForFetching<TEntity, TSpecification>. TEntity is the thing I want to get, TSpecification is how I want to get it.
Have you come across CQRS? Udi is a big proponent of it, and its purpose is to solve this exact issue.
The concept in its most basic form is to separate the domain model from querying. This means that the domain model only comes into play when you want to execute a command / commit a transaction. You don't use data from your aggregates & entities to display information on the screen. Instead, you create a separate data access service (or bunch of them) that contain methods that provide the exact data required for each screen. These methods can accept criteria objects as parameters and therefore do searching with whatever criteria you desire.
A quick sequence of how this works:
A screen shows a list of customers that have made orders in the last week.
The UI calls the CustomerQueryService passing a date as criteria.
The CustomerQueryService executes a query that returns only the fields required for this screen, including the aggregate id of each customer.
The user chooses a customer in the list, and chooses perform the 'Make Important Customer' action /command.
The UI sends a MakeImportantCommand to the Command Service (or Application Service in DDD terms) containing the ID of the customer.
The command service fetches the Customer aggregate from the repository using the ID passed in the command, calls the necessary methods and updates the database.
Building your app using the CQRS architecture opens you up to lot of possibilities regarding performance and scalability. You can take this simple example further by creating separate query databases that contain denormalised tables for every view, eventual consistency & event sourcing. There is a lot of videos/examples/blogs about CQRS that I think would really interest you.
I know your question was regarding 'fetching strategy' but I notice that he wrote this article in 2007, and it's likely that he considers CQRS its sucessor.
To summarise my answer:
Don't try and project cut down DTO's from your domain aggregates. Instead, just create separate query services that give you a tailored query for your needs.
Read up on CQRS (if you haven't already).
To add to the response by David Masters, I think all the fetching strategy interfaces are adding needless complexity. Having the Customer AR implement the various interfaces which are modeled after a UI is a needless constraint on the AR class and you will spend far to much effort trying to enforce it. Moreover, it is a brittle solution. What if a view requires data that while related to Customer, does not belong on the customer class? Does one then coerce the customer class and the corresponding ORM mappings to contain that data? Why not just have a separate set of classes for query purposes and be done with it? This allows you to deal with fetching strategies at the place where they belong - in the repository. Furthermore, what value does the fetching strategy interface abstraction really add? It may be an appropriate model of what is happening in the application, it doesn't help in implementing it.
In "DDD" what is the best patterns for handling different versions of your entities, e.g. Entities in a list vs the full object. I would like to avoid the overhead of getting properties I do not need when displaying the entities in a list
Would you have a separate entity type used in lists or just fill up your full entity type partially?
Would you use inheritance?
I understand your urge to create "views" of models in the domain, but would recommend against it. Personally, I use the entire entity inside of the domain, regardless of the situation. The entity is the entity, and anything less or more just does not feel clean. That does not mean that I can't use a reference to the entity to help focus my use of the items in the list, though.
The entity does not cross the domain boundary in my implementation. Instead, I return a type of DTO and have application services that can abstract a view from it. This allows, for example, allowing a presenter to generate the correct view model from a DTO and provide it to the view. I don't know if you are talking about operations in the domain services or in the application services, but there are a couple of things you can do that could be applied to either (or both).
You can do certain things to reduce the performance penalty of working with the entire entity in the domain layers, as well. One thing to look at is implementing some sort of cache-aside implementation. When an entity is requested, check to see if it is cached. If it is, return the cached version. If it isn't, pull it and then cache it before returning. When the entity is updated, evict it from the cache and do your update. I have purposely created my concrete repository implementations to be cache-aware to facilitate this. One other thing to consider using an approach like this is that it is beneficial to do as many fine-grained operations as possible. While that seems illogical at first, if entities are commonly "gotten" from your data store, it is easy to set up some logging to measure the number of cache hits to cache misses.
Coming full circle, to your question... Most lists I deal with are small, so I incur the penalty of loading up the entity in its entirety. Assuming that most use cases will involve the user drilling into one or more of the items, they are pre-cached because of the cache-aside implementation. The number of items is fluid, but I generally apply this approach to anything less than twenty five entities in a list.
For larger lists, I just use IDs. Most likely, the use case here is some sort of search result. Search results are commonly paged, for example, and this does not fit into the above pattern. Instead, I use the larger list of IDs as a sliding range window of entities I am interested in that I then pass to a GetRangeById() method that all of my repositories have - written to purposely take a list of identifiers and load them one at a time so they are cached. In essence, this will take a larger lightweight list and zero in just on the area I am interested in at a given point in time.
With an approach like this, the important thing to realize is that it is highly scalable. It might not baseline as fast as a non-cached approach with small sets of data, but will perform better with larger sets of data. There is an implied performance overhead of operation at play here, but it degrades at a slower rate than a standard "load 'em up" pattern, as well.
You can use CQRS pattern to separate query processing and command processing. And you can do it even on a single database. In such a case you would map you view models directly to the tables in databse (via NHibernate for example). Commands (writes) would go through real domain model and would be persisted in the DB. Queries (like get me a list of entities) would bypass the domain a go straight do DB. There is no point in querying domain object because you actually don't invoke any business logic in them, just retrieving some data.
You can also extend this solution to full-featured CQRS by having separate stores for command side and for query side. Query side would be synchronized by means of replication or pub/sub messaging.