DDD: Organizing 100s of properties on an Entity - domain-driven-design

How would you organize an entity that has 100s of properties? One could go as far to say 100s of properties, with a few Value Objects (as a few of the properties have 2 or 3 properties of their own). But the point is, how to handle the large number of properties.
I am re-creating our model from the ground-up using DDD, and the current issue is how to organize one of main entities that is broken up into many many many subsets. Currently it was written to have about a dozen sub-sets of properties. Like CarInfo() with 50+ properties, CarRankings() with 80+, CarStats(), CarColor(), etc, etc.
Think of it as mass-data stored on a single entity root.
Is it appropriate to have a service for the simple purpose of grouping a large collection of properties? Like CarInfoService that would return a Car() object, along with a large collection or sort.
Another idea would be to look at how the data is displayed. There is no one view that shows all of this data. Instead, they are split up based on their subjective matter. Like CarInfo shows all information about the car. Another would be CarStats that shows all stats of the car. So in this sense, the Application layer can build the underlying details needed for the UI. But, I still need a way to store it in the domain.
I have a mind to just put a number of xml property bags on it and call in the day. lol

This is a tough problem. You've got an aggregate root with lots of branches. But it sounds as if your users only work with certain collections at certain times. You could try pruning the tree a bit.
For instance, in your Car example, if your users are comparing the rankings of different cars, you could treat that as its own module or subsystem. No need to load up the detailed car data associated with each ranking, if the specific task is to figure out which ones rate better than others.
Remember, even though the data may be stored in a parent-child hierarchy in the database, it doesn't necessarily mean your domain will be structured in the same way.
By looking at the tasks or functions your users will perform on your data, you might discover concepts that help to break up that giant aggregate into more manageable chunks.
If you do need to assemble the full root with all of its branches, I think you'll definitely want some sort of service to bring everything together.

I think you should consider to split such an entity in different bounded contexts related via shared idenfitiers. Thus you will have different Cars in different BCs (thus different namespaces, too), and each one will handle only the informations that are related to that particular aspect.
This way, in case of deeper insight, chances are that you will have to refactor only a BC without affecting the others.

Related

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
friends
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

Service Layer DTOs - Large Complex Interactive Report-Like Objects

I have Meeting objects that form the basis of a scheduling system, of which gridviews are used to display the important information. This is for the purpose of scheduling employees to meetings, and for employees to view what has been scheduled.
I have been trying to follow DDD principles, but I'm having difficulty knowing what to pass from my service layer down to presentation area of system. This is because the schedule can be LARGE, and actually consists of many different elements of the system. Eg. Client Name, Address, Case Info, Group,etc, all of which are needed for the meeting scheduler to make a decision.
In addition to this, the scheduler needs to change values within this schedule and pass it back up to the service layer (eg. assign employees from dropdowns, maybe change group, etc). So, the information isn't really "readonly" - it needs to be interacted with. ie. It's not just a report.
Our current approach is to populate a flattened "Schedule Object" from SQL, which is constructed from small parts of different domain objects. It's quite a complex query. When changes have been made, this is then passed back up to the service layer, and the service will retrieve the domain objects in question, and fire business methods on the domain objects using information from the DTOs.
My question is, is this the correct approach? ie. Continue to generate large custom objects from SQL, and then pass down from Service Layer to Presentation Layer objects that feel a lot like View Models?
UPDATE due to an answer
To give a idea of the amount entities / aggregates relationships involved. (this is an obfuscated examples, so relationships are the important things here)
Client is in one default group
Client has one open case but many closed
Cases have many Meetings
Meeting have many assigned Employees
Meeting have many reasons
Meeting can get scheduled to different groups
Employees can be associated with many groups.
The schedule need to loads all meetings in open cases that belong to patients who are in the same groups as the employee.
Scheduler can see Client Name, Client Address, Case Info, MeetingTime, MeetingType, MeetingReasons, scheduledGroup(s) (showstrail), Assigned Employees (also has hidden employee ids).
Editable fields are assign employee dropdowns and scheduled group.
Schedule may be up to two hundred rows.
DTO is coming down from WCF, so domain model is accessed above this service layer, and not below.
Domain model business calls leveraged by service based on DTO values passed back, and repositories deal with inserts/updates.
So, I suppose to update, is using a query to populate an object which contains all of the above acceptable to pass down as one merged DTO? And if not, how would you approach it? ( giving some example calls to service layer, and explaining a little bit about how you conceive the ORM fetching the data keeping in mind performance)
In the service layer and below, I would treat each entity (see aggregate roots in DDD) separate with respect to it's transactional boundary. I.e. even if you could update a client and a case in the same UI view, it would be best to transactionally modify the client and then modify the case. The more you try to modify in one transaction, the more you can conflict with other users.
Although your schedule is large and can contain lots of objects, the service layer should again deal with each entity (aggregate root) separately and then bundle them together into a new view model. Sadly, on brown-field projects, a lot of logic might be in the SQL and the massive multi-table joins might make this harder to refactor into more atomic queries that do exactly what is needed. The old-school data-centric view of 'do everything you can in the database' goes against everything DDD.
Because DDD is a collection of design ideas and patterns and not particularly a methodology or an architecture, it sounds that it might be too late to try shoe-horn your current application into a DDD application-centric design. It sounds as though your current app is very entrenched in the data-centric view.
If everything is currently being passed up through the layers in one monolithic chunk, it might be best to keep with this style and just expose these monolithic chunks to the people in the other team who wish to consume them, for use in their new app. You might be able to put some sort of view model caching in place (a bit like the caching view model element in CQRS).
In my personal opinion, data-centric, normalised data apps have had their day (they made sense in the 1970s when hard disk space was expensive) and all apps should be moving toward more modern practices. In reality, only when legacy systems are crawling on their knees, will stakeholders usually put up the cash to look for alternatives (usually after stuffing every last server with RAM). It might be possible or best to convince them to refactor small sections at a time.

Are there any good patterns for handling list of entities

In "DDD" what is the best patterns for handling different versions of your entities, e.g. Entities in a list vs the full object. I would like to avoid the overhead of getting properties I do not need when displaying the entities in a list
Would you have a separate entity type used in lists or just fill up your full entity type partially?
Would you use inheritance?
I understand your urge to create "views" of models in the domain, but would recommend against it. Personally, I use the entire entity inside of the domain, regardless of the situation. The entity is the entity, and anything less or more just does not feel clean. That does not mean that I can't use a reference to the entity to help focus my use of the items in the list, though.
The entity does not cross the domain boundary in my implementation. Instead, I return a type of DTO and have application services that can abstract a view from it. This allows, for example, allowing a presenter to generate the correct view model from a DTO and provide it to the view. I don't know if you are talking about operations in the domain services or in the application services, but there are a couple of things you can do that could be applied to either (or both).
You can do certain things to reduce the performance penalty of working with the entire entity in the domain layers, as well. One thing to look at is implementing some sort of cache-aside implementation. When an entity is requested, check to see if it is cached. If it is, return the cached version. If it isn't, pull it and then cache it before returning. When the entity is updated, evict it from the cache and do your update. I have purposely created my concrete repository implementations to be cache-aware to facilitate this. One other thing to consider using an approach like this is that it is beneficial to do as many fine-grained operations as possible. While that seems illogical at first, if entities are commonly "gotten" from your data store, it is easy to set up some logging to measure the number of cache hits to cache misses.
Coming full circle, to your question... Most lists I deal with are small, so I incur the penalty of loading up the entity in its entirety. Assuming that most use cases will involve the user drilling into one or more of the items, they are pre-cached because of the cache-aside implementation. The number of items is fluid, but I generally apply this approach to anything less than twenty five entities in a list.
For larger lists, I just use IDs. Most likely, the use case here is some sort of search result. Search results are commonly paged, for example, and this does not fit into the above pattern. Instead, I use the larger list of IDs as a sliding range window of entities I am interested in that I then pass to a GetRangeById() method that all of my repositories have - written to purposely take a list of identifiers and load them one at a time so they are cached. In essence, this will take a larger lightweight list and zero in just on the area I am interested in at a given point in time.
With an approach like this, the important thing to realize is that it is highly scalable. It might not baseline as fast as a non-cached approach with small sets of data, but will perform better with larger sets of data. There is an implied performance overhead of operation at play here, but it degrades at a slower rate than a standard "load 'em up" pattern, as well.
You can use CQRS pattern to separate query processing and command processing. And you can do it even on a single database. In such a case you would map you view models directly to the tables in databse (via NHibernate for example). Commands (writes) would go through real domain model and would be persisted in the DB. Queries (like get me a list of entities) would bypass the domain a go straight do DB. There is no point in querying domain object because you actually don't invoke any business logic in them, just retrieving some data.
You can also extend this solution to full-featured CQRS by having separate stores for command side and for query side. Query side would be synchronized by means of replication or pub/sub messaging.

In DDD, are collection properties of entities allowed to have partial values?

In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?
I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.
Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.

Resources