Too many mandatory fields in aggregate - domain-driven-design

I have an aggregate called Order which has a big list (upwards of 20) of fields that the Business has identified as mandatory fields.
An order cannot be in a valid state without these mandatory fields.
In my domain layer, when I create the Order aggregate, there is some domain logic involved which requires just a few of these 20 mandatory fields (just about 5-6 are required for the domain logic).
If I go purely by the principles of DDD and treat these mandatory values as invariants, I have to create my Order domain object with all fields, most of which should merely pass a not-null/empty validation. It does feel like there could be a better way to handle this scenario.
So what should be my approach? Should I create the domain object with just those 5-6 fields that are required for domain logic? Wouldn't that mean the Order aggregate is then in an invalid state from a business perspective? But the domain object is much simpler if created with a smaller set of values.
I have checked with the Business about whether they really require all those fields, and yes, they have a very real need for an order to be created with all those mandatory fields.

I found the answer to my question here:
https://softwareengineering.stackexchange.com/questions/301928/aggregate-root-with-many-fields
From the link above:
Think of it this way: while you order something from an e-commerce website you might be filling out your order in several distinct steps, going through a number of screens to enter the products, shipping information, payment information, etc...
The order is saved in between steps, even though it is lacking in the needed information to fullfill the order! The only difference is that before an order can be shipped it needs that information. It does not mean that it cannot be created.

Related

How to reduce the higher time complexity brought by DDD

After reading the book Domain-Driven Design and some characters in the book Implementing Domain-Driven Design, I finally try to use DDD in a small service of a microservice system. And I have some questions here.
Here we have the entities Namespace, Group, and Resource. They are also aggregate roots:
As the picture pointed out, we have many Namespaces for users. And in every Namespace, we have Groups as well. And in every Group, we have Resources.
But I have a business logic:
The Group should have a unique name in its Namespace. (It is useful that the user can find the Group by its name)
To make it come true, I need to do those steps in the application layer to add a group with time complexity O(n):
Get the Namespace by its ID from the Repository of Namespace. It has a field Groups, and its type is []GroupID.
Get []Group value by []GroupID value from the Repository of Group.
Check if the name of the new group is unique in the existing Groups we get.
If it does be unique, then use the Repository of Group to save it.
But I think if I just use a sample transaction script, I can finish those in O(lg n). Because I know that I can let the field of Group name be unique in the database. How can I do it in DDD?
My thinking is:
I should add a comment in method save of the Repository interface for Group to let the user know that the save will check the name if is unique in the same Namespace.
Or we should use CQRS to check if the name of Group is unique? Another question is that maybe a Namespace may have a lot of Group. Even though we only put the ID of Group in the entity Namespace, it does cost a lot of space size. How to paginate the data?······ If we only want to get the name of Namespace by its ID, why we need get those IDs for Groups?
I do not want the DDD to limit me. But I still want to know what is the best practices. Before I know what happens, I try to avoid breaking rules.
My solution:
Thanks for the answer by #voiceofunreason. I find that it is hard to write code for set validation in the domain layer still.
#voiceofunreason tells me that I need consider the real world. I do consider and I am still confused that how to implement it to avoid breaking DDD rules. (Sorry but my question is not do we need the condition or not. My question is HOW to make the condition(or domain logic) come true without higher time complexity)
To be honest, I only have a MongoDB serving for storing all data. If I am using Transaction Script, everything is easy:
Create an index for the name of Group to make sure the names are unique.
Just insert a new Group. If the database raises any error, just refuse the request from the user.
But if I want to follow the DDD, and put the logic into the domain layer, I even do not know where to put the logic (it is easy in Transaction Script, right?). It really makes me feel blue. So my solution is:
Use DDD to split the total project into many bounded contexts.
And we do not care if we use the DDD or others in the bounded context. So tired I am.
In this bounded context, we just use Transaction Script.
Is the DDD not well to hold the condition for the set of entities, right? Because DDD always wants to get all data from the database rather than just deal in the database. Sometimes it makes the time complexity higher and I still do not know how to avoid it. Maybe I am wrong. If I am, please comment or post a new answer, thanks a lot.
The Group should have a unique name in its Namespace.
The general term for this problem is set validation. We have some collection of items, and we want to ensure that some condition holds over the entire set....
What is the business impact of having a failure
This is the key question we need to ask and it will drive our solution
in how to handle this issue as we have many choices of varying degrees
of difficulty. -- Greg Young, 2010
Some questions to consider include: is this a real constraint of the domain, or just an attempt at proofreading? Are we the authority for this data, or are we just storing a local copy of data that belongs to someone else? When we have conflicting information, can the computer determine whether the older or newer entry is in error? Does the business currently have a remediation process to use when the set condition doesn't hold? Can the business tolerate a conflict for some period of time (until end of day? minutes? nanoseconds?)
(In thinking about this last question, you may want to review Race Conditions Don't Exist, by Udi Dahan).
If the business requirement really is "we must never write conflicting entries into the collection", then any change you make must lock the collection against any potential conflicts. And this in turn has implications about, for example, how you can store the collection (trying to enforce a condition on a distributed collection is an expensive problem to have).
For the case where you can say: it makes sense to throw all of this data into a single relational database, then you might consider that the domain model is just going to make a "best effort" to avoid conflicts, and then re-enforce that with a "real" constraint in the data model.
You don't get bonus points for doing it the hard way.

Where to place this invariant?

I'm working on a side project to learn and apply DDD within the "Daily Deal' domain. In my purchasing context, i have an invariant where a user can only purchase 'x' amount of deals per deal.
so it seems wasteful for my deal aggregate to load all purchases from all users just to check and see how many times (if any) the user has purchased this deal. I see two ways i could go about this.
Put this logic within a domain service which would allow a pre-condition to already have been met when the Purchase method on the Deal aggregate is invoked.
My repository implementation could always populate the purchases collection of the deal for the purchasing user. hmm...not sure about this one.
any guidance would be great!
I would take the second approach, but with one important change. I would instead create a value object called PurchasedDeal, that consists of just a DealID and Quantity field. The User aggregate could instead load a collection of this more lightweight purchase history object. Performance should be good with this approach, since I'm guessing that the average user will only have a few dozen purchase records.
Also remember that with DDD, you can and probably should have different models per bounded context. So you might design your User aggregate like this in the context of deals/purchasing. However, your User aggregate in another context would look different and not have a purchase history if it's not needed.

DDD: How to handle large collections

I'm currently designing a backend for a social networking-related application in REST. I'm very intrigued by the DDD principle. Now let's assume I have a User object who has a Collection of Friends. These can be thousands if the app and the user would become very successful. Every Friend would have some properties as well, it is basically a User.
Looking at the DDD Cargo application example, the fully expanded Cargo-object is stored and retrieved from the CargoRepository from time to time. WOW, if there is a list in the aggregate-root, over time this would trigger a OOM eventually. This is why there is pagination, and lazy-loading if you approach the problem from a data-centric point of view. But how could you cope with these large collections in a persistence-unaware DDD?
As #JefClaes mentioned in the comments: You need to determine whether your User AR indeed requires a collection of Friends.
Ownership does not necessarily imply that a collection is necessary.
Take an Order / OrderLine example. An OrderLine has no meaning without being part of an Order. However, the Customer that an Order belongs to does not have a collection of Orders. It may, possibly, have a collection of ActiveOrders if a customer is limited to a maximum number (or amount) iro active orders. Keeping a collection of historical orders would be unnecessary.
I suspect the large collection problem is not limited to DDD. If one were to receive an Order with many thousands of lines there may be design trade-offs but the order may much more likely be simply split into smaller orders.
In your case I would assert that the inclusion / exclusion of a Friend has very little to do with the consistency of the User AR.
Something to keep in mind is that as soon as you start using you domain model for querying your start running into weird sorts of problems. So always try to think in terms of some read/query model with a simple query interface that can access your data directly without using your domain model. This may simplify things.
So perhaps a Relationship AR may assist in this regard.
If some paging or optimization techniques are the part of your domain, it's nothing wrong to design domain classes with this ability.
Some solutions I've thought about
If User is aggregate root, you can populate your UserRepository with method GetUserWithFriends(int userId, int firstFriendNo, int lastFriendNo) encapsulating specific user object construction. In same way you can also populate user model with some counters and etc.
On the other side, it is possible to implement lazy loading for User instance's _friends field. Thus, User instance can itself decide which "part" of friends list to load.
Finally, you can use UserRepository to get all friends of certain user with respect to paging or other filtering conditions. It doesn't violate any DDD principles.
DDD is too big to talk that it's not for CRUD. Programming in a DDD way you should always take into account some technical limitations and adapt your domain to satisfy them.
Do not prematurely optimize. If you are afraid of large stress, then you have to benchmark your application and perform stress tests.
You need to have a table like so:
friends
id, user_id1, user_id2
to handle the n-m relation. Index your fields there.
Also, you need to be aware whether friends if symmetrical. If so, then you need a single row for two people if they are friends. If not, then you might have one row, showing that a user is friends with the other user. If the other person considers the first a friend as well, you need another row.
Lazy-loading can be achieved by hidden (AJAX) requests so users will have the impression that it is faster than it really is. However, I would not worry about such problems for now, as later you can migrate the content of the tables to a new structure which is unkown now due to the infinite possible evolutions of your project.
Your aggregate root can have a collection of different objects that will only contain a small subset of the information, as reference to the actual business objects. Then when needed, items can be used to fetch the entire information from the underlying repository.

Workorder management with DDD and ORM

The central tenet to the software I am building is the "workorder"
WorkOrder as I see it would be an "aggregate root" that contains basic information about the work order such as creation date, model/manufacturer, serial number, purchase order.
In addition to these "value" objects, there are also sub "entities" or "aggregates" such as:
Sequences
Reworks
Dimensions
QuoteItems
Consumables
None of the above can/should exist without an associated work order. In the existing system they actually occasionally do but that is because of lack of transactions or checks in code to ensure integrity. They are orphaned records and deleted via scheduled clean up - one of the many reasons I am learning more about DDD and ORM to bring our development practices up to speed.
NOTE: This is probably off topic and can likely be skipped in your
reply
because we are primarily a web-based interface using extJS, each of
the list controls that display each of the above, I have been
reluctant to switch to ORM and DDD. Each list is populated via a
controller:action that queries the DB (ie: sequences list is populated
when the JS control calls a sequence REST URI with GET command). This
GET command invokes a controller that instantiates a sequence object
and calls the selectAllForWorkorderID method
My understanding of ORM is that I would use a repository to query
these items. Fine, however if this sequence object (in DDD parlance)
is considered an aggregate of WorkOrder root - then I must find the
workorder first and traverse the sequences through the WorkOrder.
In a AJAX web-based context this feels funny to me - but in a desktop
environment or even standard web-based context this is acceptable as I
would only query the WorkOrder object once each time a WorkOrder item
is selected in the master list. Not 6 or 8 times for each individual
list to be populated.
I can see now that our system actually has several aggregate root objects, work order is just the more complicated of the few:
WorkOrder
Warranties
Repair Orders
Are the primary roots. Warranties are dependent on work order ID's and Repair Orders can be but not always.
Ignoring the latter roots - allow me to focus solely on WorkOrder.
When I begin examine the existing models and try to determine what is business logic and/or application logic I am slightly confused. What goes into a "service" versus "aggregate root".
Consider one such method in the current model:
createWorkOrderFromRpi.
RPI's are approved documents that act as templates for WorkOrders - they dictate what sequences and the order of execution "can" be performed, dimensions, list of consumables etc. This is a separate system altogether and I believe would best be described as a "module" in DDD nomenclature.
This method has to query the RPI system and obtain the work order header details, sequence list, consumables, etc.
Once it has this data it calls the associated objects and methods:
WorkOrder.Create(Header Details)
Sequence.Create(Sequence Details) - Done in loop (1:m)
Consumable.Create(Consumable Details) - Done in loop (1:m)
In following DDD I am tempted to have the WorkOrder "aggregate root" provide a method with an identical signature however I am reluctant to do so.
I believe each of the "entities" that are aggregates of WorkOrder fit the description and should not ever be exposed to anything outside of the "root" unless traversed through the root itself. There may be cases where this is not the case. On second thought, the interface only ever exposes consumables and sequences and such when a work order is selected which would imply a work order must be loaded anyway?!?
There are some essential business rules which this method must perform:
A Work order with identical serial number is not actively already in the system (unless archived) unless it's on sub-contract in which case do not create a new work but receive a repair order for this work order instead.
There are a few more "rules" but I will exclude them for the sake of brevity.
The individual entities perform micro business validations, for example some fields, such as serial number, have a specific format, as do part numbers and purchase order numbers.
My primary question or concern, is given the above description, would this method best be implemented in an "aggregate root" or "service"?
UPDATE | One final question...if aggregate root is the proper concept...and I need access to the sequences so that I may update a field I would access conceptually (ignore syntax) like:
WorkOrder.Sequences(0).moveToNext()
If this method was implemented in the sequences "entity" which makes sense. Where does the division between technical details and business logic exist? For example, to move a work order from one sequence to the next, we update three timestamps per sequence:
date_entered
date_started
date_finished
When the last timestamp is set, the next sequence date_entered is set to the same time as previous sequence date_finished and the system knows this is the active sequence now. Thats a technical matter.
But a business rule or constraint would be:
Don't move work order if moved into history
Don't move work order if in rework
Don't move work order if in subcon
These are rules, which I would love to keep separated and distinct so as to make it easy for me to translate into English in the form of a specs document which I could present management as a living document and proof of functionality. I was kind of hoping that is what DDD would enforce/promote in a clean manner. Is this a requirement handled independently of DDD? Is this where CQS comes in? Separating business rules from technical matters which are of zero relevance to stake holders?
Alex
I think your createWorkOrderFromRpi() method should be on a "Service" rather than the WorkOrder aggregate root. This service method would then call methods on your Repositories or DAOs to create the workorder. An Aggregate Root typically combines entities but on your model I think RPI is a template or specification outside of the work order aggregate root. If RPI is part of the aggregate then you should put the method on the repository directly and call it wherever, as a repository is a business object in DDD also.
On the second question I believe a WorkOrder Aggregate Root is totally correct for the other "dependent" entities you listed, namely
Sequences
Reworks
Dimensions
QuoteItems
Consumables
I'm interested to know how you implemented this.

Implementing Udi's Fetching Strategy - How do I search?

Background
Udi Dahan suggests a fetching strategy as a useful pattern to use for data access. I agree.
The concept is to make roles explicit. For example I have an Aggregate Root - Customer. I want customer in several parts of my application - a list of customers to select from, a view of the customer's details, and I want a button to deactivate a customer.
It seems Udi would suggest an interface for each of these roles. So I have ICustomerInList with very basic details, ICustomerDetail which includes the latest 10 products purchased, and IDeactivateCustomer which has a method to deactivate the customer. Each interface exposes just enough of my Customer Aggregate Root to get the job done in each situation. My Customer Aggregate Root implements all these interfaces.
Now I want to implement a fetching strategy for each of these roles. Each strategy can load a different amount of data into my Aggregate Root because it will be behind an interface exposing only the bits of information needed.
The general method to implement this part is to ask a Service Locator or some other style of dependency injection. This code will take the interface you are wanting, for example ICustomerInList, and find a fetching strategy to load it (IStrategyForFetching<ICustomerInList>). This strategy is implemented by a class that knows to only load a Customer with the bits of information needed for the ICustomerInList interface.
So far so good.
Question
What you pass to the Service Locator, or the IStrategyForFetching<ICustomerInList>. All of the examples I see are only selecting one object by a known id. This case is easy, the calling code passes this id through and will get back the specific interface.
What if I want to search? Or I want page 2 of the list of customers? Now I want to pass in more terms that the Fetching Strategy needs.
Possible solutions
Some of the examples I've seen use a predicate - an expression that returns true or false if a particular Aggregate Root should be part of the result set. This works fine for conditions but what about getting back the first n customers and no more? Or getting page 2 of the search results? Or how the results are sorted?
My first reaction is to start adding generic parameters to my IStrategyForFetching<ICustomerInList> It now becomes IStrategyForFetching<TAggregateRoot, TStrategyForSelecting, TStrategyForOrdering>. This quickly becomes complex and ugly. It's further complicated by different repositories. Some repositories only supply data when using a particular strategy for selecting, some only certain types of ordering. I would like to have the flexibility to implement general repositories that can take sorting functions along with specialised repositories that only return Aggregate Roots sorted in a particular fashion.
It sounds like I should apply the same pattern used at the start - How do I make roles explicit? Should I implement a strategy for fetching X (Aggregate Root) using the payload Y (search / ordering parameters)?
Edit (2012-03-05)
This is all still valid if I'm not returning the Aggregate Root each time. If each interface is implemented by a different DTO I can still use IStrategyForFetching. This is why this pattern is powerful - what does the fetching and what is returned doesn't have to map in any way to the aggregate root.
I've ended up using IStrategyForFetching<TEntity, TSpecification>. TEntity is the thing I want to get, TSpecification is how I want to get it.
Have you come across CQRS? Udi is a big proponent of it, and its purpose is to solve this exact issue.
The concept in its most basic form is to separate the domain model from querying. This means that the domain model only comes into play when you want to execute a command / commit a transaction. You don't use data from your aggregates & entities to display information on the screen. Instead, you create a separate data access service (or bunch of them) that contain methods that provide the exact data required for each screen. These methods can accept criteria objects as parameters and therefore do searching with whatever criteria you desire.
A quick sequence of how this works:
A screen shows a list of customers that have made orders in the last week.
The UI calls the CustomerQueryService passing a date as criteria.
The CustomerQueryService executes a query that returns only the fields required for this screen, including the aggregate id of each customer.
The user chooses a customer in the list, and chooses perform the 'Make Important Customer' action /command.
The UI sends a MakeImportantCommand to the Command Service (or Application Service in DDD terms) containing the ID of the customer.
The command service fetches the Customer aggregate from the repository using the ID passed in the command, calls the necessary methods and updates the database.
Building your app using the CQRS architecture opens you up to lot of possibilities regarding performance and scalability. You can take this simple example further by creating separate query databases that contain denormalised tables for every view, eventual consistency & event sourcing. There is a lot of videos/examples/blogs about CQRS that I think would really interest you.
I know your question was regarding 'fetching strategy' but I notice that he wrote this article in 2007, and it's likely that he considers CQRS its sucessor.
To summarise my answer:
Don't try and project cut down DTO's from your domain aggregates. Instead, just create separate query services that give you a tailored query for your needs.
Read up on CQRS (if you haven't already).
To add to the response by David Masters, I think all the fetching strategy interfaces are adding needless complexity. Having the Customer AR implement the various interfaces which are modeled after a UI is a needless constraint on the AR class and you will spend far to much effort trying to enforce it. Moreover, it is a brittle solution. What if a view requires data that while related to Customer, does not belong on the customer class? Does one then coerce the customer class and the corresponding ORM mappings to contain that data? Why not just have a separate set of classes for query purposes and be done with it? This allows you to deal with fetching strategies at the place where they belong - in the repository. Furthermore, what value does the fetching strategy interface abstraction really add? It may be an appropriate model of what is happening in the application, it doesn't help in implementing it.

Resources