Should DDD entities compare by reference or by ID?

Should DDD entities compare by reference or by ID? - domain-driven-design

When I started using DDD, I created Equals() methods in my entities that compared the ID of the entity. So two entity objects with the same ID would be considered equal.
At some point I thought about that and found that two entities in different states should not be considered equal, even when they describe the same thing (i.e. have the same ID). So now I use reference equality for my entities.
I then stumbled over this answer by Mark Seemann, where he writes
Entities are equal if their IDs equal each other.
Now, of course, I'd like to know which approach is better.
Edit: Note that the question is not whether having two instances of the same entity at the same time is a good idea. I'm aware that in most situations it is probably not.

The question is twofold. First, what you really want to know is
How to handle terms that the X language (or Y framework) impose when I code a domain model with it?
C# for example imposes you that any new concept you define inherit a certain set of public methods. Java includes even more methods.
I've never heard a domain expert talking about hash codes or instance equality, but this is one of those situations when the (often misunderstood) quote "don't fight the framework" from Evans apply: just teach developers to not use them when they do not belong to domain's interfaces.
Then, what you want to know is
What is an entity? How it relates to its own identity?
Start with why! You know that entities are terms of the ubiquitous language that are identifiable.
But WHY?
Plain simple: entities describe concepts whose evolution in time is relevant in the context of the problem we are solving!
It is the relevance of the evolution that defines the entity, not the other way around! The identity is just a communication tool to keep track of the evolution, to talk about it.
As an example think about you: you are a person with a name; we use your name to communicate about your interactions with the rest of the world during your life; still, you are not that name.
Ask yourself: why I need to compare domain entities? Is the domain expert talking this way? Or I'm just using a DDD parlance to describe a CRUD application that interact with a relational database?
To me, the need to actually implement Equals(object) or GetHashCode() into an entity looks like a smell of an inadequate infrastructure.

Entities shouldn't be compared like that, in the first place. There is no valid use case (outside testing, but then again the assertion library should handle this for you) to see if 2 entities are equal using the object's Equals method.
What makes an Entity unique is its Id. The purpose of the id is to say 'this very entity is different from other entities despite having identical properties/values'.
That being said, in a Domain you might need to compare a concept instance with another instance. The comparison is done according to Bounded Context (or even Aggregate) specific Domain rules. It doesn't matter an entity is involved, it could have been a value object as well.
Basically the 'comparison' should be a Domain use case which will be probably implemented as a service. This has no relation to an object's Equals method, which is a technical aspect.
When doing DDD, don't think like a programmer (i.e technical aspects) think like an architect (high level). Code, programming language etc is just an implementation detail.

I think it's bad idea to have the two separate instances of the same entity in different states. I can't think of a scenario where that would be desirable. Maybe there is one? I believe there should only ever be one instance of a given entity with a particular ID.
In general I'd compare their equality using their IDs.
But if you wanted to check if they are the same object reference then you could just use:
if (Object.ReferenceEquals(entityA, entityB))
{
DoSomething();
}

Related

What is an Aggregate Root?

No, it is not a duplication question.
I have red many sources on the subject, but still I feel like I don't fully understand it.
This is the information I have so far (from multiple sources, be it articles, videos, etc...) about what is an Aggregate and Aggregate Root:
Aggregate is a collection of multiple Value Objects\Entity references and rules.
An Aggregate is always a command model (meant to change business state).
An Aggregate represents a single unit of (database - because essentialy the changes will be persisted) work, meaning it has to be consistent.
The Aggregate Root is the interface to the external world.
An Aggregate Root must have a globally unique identifier within the system
DDD suggests to have a Repository per Aggregate Root
A simple object from an aggregate can't be changed without its AR(Aggregate Root) knowing it
So with all that in mind, lets get to the part where I get confused:
in this site it says
The Aggregate Root is the interface to the external world. All interaction with an Aggregate is via the Aggregate Root. As such, an Aggregate Root MUST have a globally unique identifier within the system. Other Entites that are present in the Aggregate but are not Aggregate Roots require only a locally unique identifier, that is, an Id that is unique within the Aggregate.
But then, in this example I can see that an Aggregate Root is implemented by a static class called Transfer that acts as an Aggregate and a static function inside called TransferedRegistered that acts as an AR.
So the questions are:
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that its a function. what does have a globaly unique identifier is the Domain Event that this function produces.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function of the Aggregate class itself?
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier), then how can we interact with this Aggregate? the first article clearly stated that all interaction with an Aggregate is by the AR, if the AR is an event, then we can do nothing but react on it.
Is it right to say that the aggregate has two main jobs:
Apply the needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be raised in a Domain Event from the AR
Please correct me on any of the bullet points in the beginning if some/all of them are wrong is some way or another and feel free to add more of them if I have missed any!
Thanks for clarifying things out!

I feel like I don't fully understand it.
That's not your fault. The literature sucks.
As best I can tell, the core ideas of implementing solutions using domain driven design came out of the world of Java circa 2003. So the patterns described by Evans in chapters 5 and six of the blue book were understood to be object oriented (in the Java sense) domain modeling done right.
Chapter 6, which discusses the aggregate pattern, is specifically about life cycle management; how do you create new entities in the domain model, how does the application find the right entity to interact with, and so on.
And so we have Factories, that allow you to create instances of domain entities, and Repositories, that provide an abstraction for retrieving a reference to a domain entity.
But there's a third riddle, which is this: what happens when you have some rule in your domain that requires synchronization between two entities in the domain? If you allow applications to talk to the entities in an uncoordinated fashion, then you may end up with inconsistencies in the data.
So the aggregate pattern is an answer to that; we organize the coordinated entities into graphs. With respect to change (and storage), the graph of entities becomes a single unit that the application is allowed to interact with.
The notion of the aggregate root is that the interface between the application and the graph should be one of the members of the graph. So the application shares information with the root entity, and then the root entity shares that information with the other members of the aggregate.
The aggregate root, being the entry point into the aggregate, plays the role of a coarse grained lock, ensuring that all of the changes to the aggregate members happen together.
It's not entirely wrong to think of this as a form of encapsulation -- to the application, the aggregate looks like a single entity (the root), with the rest of the complexity of the aggregate being hidden from view.
Now, over the past 15 years, there's been some semantic drift; people trying to adapt the pattern in ways that it better fits their problems, or better fits their preferred designs. So you have to exercise some care in designing how to translate the labels that they are using.

In simple terms an aggregate root (AR) is an entity that has a life-cycle of its own. To me this is the most important point. One AR cannot contain another AR but can reference it by Id or some value object (VO) containing at least the Id of the referenced AR. I tend to prefer to have an AR contain only other VOs instead of entities (YMMV). To this end the AR is responsible for consistency and variants w.r.t. the AR. Each VO can have its own invariants such as an EMailAddress requiring a valid e-mail format. Even if one were to call contained classes entities I will call that semantics since one could get the same thing done with a VO. A repository is responsible for AR persistence.
The example implementation you linked to is not something I would do or recommend. I followed some of the comments and I too, as one commenter alluded to, would rather use a domain service to perform something like a Transfer between two accounts. The registration of the transfer is not something that may necessarily be permitted and, as such, the domain service would be required to ensure the validity of the transfer. In fact, the registration of a transfer request would probably be a Journal in an accounting sense as that is my experience. Once the journal is approved it may attempt the actual transfer.
At some point in my DDD journey I thought that there has to be something wrong since it shouldn't be so difficult to understand aggregates. There are many opinions and interpretations w.r.t. to DDD and aggregates which is why it can get confusing. The other aspect is, in IMHO, that there is a fair amount of design involved that requires some creativity and which is based on an understanding of the domain itself. Creativity cannot be taught and design falls into the realm of tacit knowledge. The popular example of tacit knowledge is learning to ride a bike. Now, we can read all we want about how to ride a bike and it may or may not help much. Once we are on the bike and we teach ourselves to balance then we can make progress. Then there are people who end up doing absolutely crazy things on a bike and even if I read how to I don't think that I'll try :)
Keep practicing and modelling until it starts to make sense or until you feel comfortable with the model. If I recall correctly Eric Evans mentions in the Blue Book that it may take a couple of designs to get the model closer to what we need.

Keep in mind that Mike Mogosanu is using a event sourcing approach but in any case (without ES) his approach is very good to avoid unwanted artifacts in mainstream OOP languages.
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that
its a function. what does have a globaly unique identifier is the
Domain Event that this function produces.
TransferNumber acts as natural unique ID; there is also a GUID to avoid the need a full Value Object in some cases.
There is no unique ID state in the computer memory because it is an argument but think about it; why you want a globaly unique ID? It is just to locate the root element and its (non unique ID) childrens for persistence purposes (find, modify or delete it).
Order A has 2 order lines (1 and 2) while Order B has 4 order lines (1,2,3,4); the unique identifier of order lines is a composition of its ID and the Order ID: A1, B3, etc. It is just like relational schemas in relational databases.
So you need that ID just for persistence and the element that goes to persistence is a domain event expressing the changes; all the changes needed to keep consistency, so if you persist the domain event using the global unique ID to find in persistence what you have to modify the system will be in a consistent state.
You could do
var newTransfer = New Transfer(TransferNumber); //newTransfer is now an AG with a global unique ID
var changes = t.RegisterTransfer(Debit debit, Credit credit)
persistence.applyChanges(changes);
but what is the point of instantiate a object to create state in the computer memory if you are not going to do more than one thing with this object? It is pointless and most of OOP detractors use this kind of bad OOP design to criticize OOP and lean to functional programming.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function
of the Aggregate class itself?
It is the function itself. You can read in the post:
AR is a role , and the function is the implementation.
An Aggregate represents a single unit of work, meaning it has to be consistent. You can see how the function honors this. It is a single unit of work that keeps the system in a consistent state.
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier),
then how can we interact with this Aggregate? the first article
clearly stated that all interaction with an Aggregate is by the AR, if
the AR is an event, then we can do nothing but react on it.
Answered above because the domain event is not the AR.
4 Is it right to say that the aggregate has two main jobs: Apply the
needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be
raised in a Domain Event from the AR
Yes; again, you can see how the static function honors this.
You could try to contat Mike Mogosanu. I am sure he could explain his approach better than me.

Search query and search result in DomainDrivenDesign

I have a web application with news posts. Those news posts should be searchable. In context of DDD what kind of buililng block are search query and search result?
These are my thoughts
They both don't have identity, therefore they are not Entities. But the absence of identity doesn't imply they are Value Object. As Eric Evans states:
However, if we think of this category of object as just the absence of identity, we haven't added much to our toolbox or vocabulary. In fact, these objects have characteristics of their own and their own significance to the model. These are the objects that describe things.
I would like to say that both are value objects, but I'm not sure. What confuses me are examples that I've found on Internet. Usualy value object are part of other entities, they are not "standalone". Martin Fowler gives for example Money or date range object. Adam Bien event compares them to enums.
If search result would be considered value object, that would be value object that consists of entities. I'm not sure that's completely alright.
I don't think they are DataTransferObject. Because we are not current concerned with transferring data between layers, but we are concerned with their meaning for the model in absence of layer.
I don't think search query is command. Because it's not a request for change.
As stated on CQRS
People request changes to the domain by sending commands.
I'm trying to use and learn DDD, can someone clarify this problem to me? Where did I go wrong with reasoning?

The simple answer is that querying is probably not part of your domain. The domain model is not there to serve queries, it is there to enforce invariants in your domain. Since by nature queries are read-only, there are no invariants to enforce, so why complicate things? I think where people usually go wrong with DDD is that they assume since they are "doing DDD" every aspect of the system must be handled by a domain model. DDD is there to help with the complex business rules and should only be applied when/where you actually have them. Also, you can and probably should have many models to support each bounded context. But that's another discussion.
It's interesting that you mention CQRS because what does that stand for? Command query responsibility segregation. So if commands use the domain model, and query responsibility is segregated from that, what does that tell you to do? The answer is, do whatever is easiest to query and display that data. If select * from news table filled to dataset works, go with that. If you prefer entity framework, go with that. There is no need to get the domain model involved for queries.
One last point I'd like to make is I think a lot of people struggle with DDD by applying it to situations where there aren't many business invariants to enforce and the domain model ends up looking a lot like the database. Be sure you are using the right tool for the job and not over complicating things. Also, you should only apply DDD in the areas of your system where these invariants exist. It's not an all or nothing scenario.

UML and Implementation: Associating Classes through IDs

I was recently studying an online course. it was recommended that to reduce coupling we could simply pass the ID from the customer object to the Order object. that way the Order did not have to have a full reference to the Customer class.
The idea certainly seems simple and why pass a whole object if you don't need all its attributes?
1) What do you think of this idea?
2) How would I express the relationship between the Customer class and the Order class in UML if only an ID is passed. This isn't just an example of aggregation is it? Doesn't composition and aggregation require more than just passing a value?
Thanks!

First of all you need to be clear about what UML actually is. On the one hand you have an idea and on the other side there is some code running on hardware. Ideally the latter supports the first in a way that brings added value to a user of the idea. Now, there are many possibilities to describe the way from idea to code. And UML is one of them. It is possible to describe each step on this way but for pragmatic reasons UML stops at the border of code, namely programming languages.
Now for you concrete question: Any object can be seen as an instance. That is some concrete memory partition with a fixed address. Programming languages realize instances by allocating memory and using the start address as reference. And since this reference does not change the object can be identified by its address. Clearly then, an association will just be the a pointer. And an association class will hold two (or more) such pointers.
Honestly, the very first time I started with OO I was also confused and thought that it's a waste of resource to pass those large objects. But since it's just a pointer it's really easy going.
Again, things can get more difficult if you need to persist objects. In that case you need an artificial key you can save along with the object and you will likely need tables to map artificial key to the concrete instance address.

The answer to this question depends on a number of factors, which I started listing in a comment attached to your question. I will assume that you are either using UML to create a Domain Model, or you are describing an implementation done using a statically typed language.
If you are using UML to create a Domain Model, you are obfuscating the semantics when you use an ID to "link" classes. Just draw and annotate the association and you're done.
If you are describing an implementation done using a statically typed language - types exist for a reason. Using generic IDs to link things means that the information that the system needs most become more indirect, and therefore more opaque (which is bad). In your case, the Order object still must acquire a typed reference to a Customer object to do anything with it.
For example, the Order may acquire a reference to the Customer by invoking a lookup by the ID, but it must cast the reference to an appropriate type to invoke anything on the Customer object. So you haven't reduced the coupling from the Order to the Customer. You just buried it somwhere else.

DDD domain services: what should a service class contain?

In Domain Driven Design, domain services should contain operations that do not naturally belong inside an entity.
I've had the habit to create one service per entity and group some methods inside it (Organization entity and OrganizationService service).
But the more I think about it: OrganizationService doesn't mean anything, "Organization" is not a service, it's a thing.
So right now I have to add a Organization deep copy functionality that will duplicate a whole Organization aggregate, so I want to put it in a service.
Should I do: OrganizationService::copyOrganization(o)?
Or should I do: OrganizationCopyService::copyOrganization(o)?
More generally: is a "service" an abstract concept containing several operations, or is a service a concrete operation?
Edit: more examples given the first one wasn't that good:
StrategyService::apply()/cancel() or StrategyApplicationService::apply()/cancel()? ("Application" here is not related to the application layer ;)
CarService::wash() or CarWashingService::wash()?
In all these examples the most specific service name seems the most appropriate. After all, in real life, "car washing service" is something that makes sense. But I may end up with a lot of services...
*Note: this is not a question about opinions! This is a precise, answerable question about the Domain Driven Design methodology. I'm always weary of close votes when asking "should I", but there is a DDD way of doing things.*

I think it's good if a domain service has only one method. But I don't think it is a rule like you must not have more than one method on a domain service or something. If the interface abstracts only one thing or one behaviour, it's certainly easy to maitain but the granularity of the domain service totally depends on your bounded context. Sometimes we focus on low coupling too much and neglect high cohesive.

This is a bit opinion based I wanted to add it as a comment but ran out space.
I believe that in this case it will make sense to group the methods into one a separate OrganizationFactory-service with different construction method.
interface OrganizationFactory{
Organization createOrganization();
Organization createOrganizationCopy(Organization organization);
}
I suppose it will be in accordance with information expert pattern and DRY principle - one class has all the information about specific object creation and I don't see any reason to repeat this logic in different places.
Nevertheless, an interesting thing is that in ddd definition of factory pattern
Shift the responsibility for creating instances of complex objects and
AGGREGATES to a separate object, which may itself have no
responsibility in the domain model but is still part of the domain
design. Provide an interface that encapsulates all complex assembly
and that does not require the client to reference the concrete classes
of the objects being instantiated.
the word "object" is in a generic sense doesn't even have to be a separate class but can also be a factory method(I mean both the method of a class and the pattern factory method) - later Evans gives an example of the factory method of Brokerage Account that creates instances of Trade Order.
The book references to the family of GoF factory patterns and I do not think that there's a special DDD way of factory decomposition - the main points are that the object created is not half-baked and that the factory method should add as few dependecies as possible.
update DDD is not attached to any particular programming paradigm, while the question is about object-oriented decomposition, so again I don't think that DDD can provide any special recommendations on the number of methods per object.
Some folks use strange rules of thumb, but I believe that you can just go with High Cohesion principle and put methods with highly related responsibilities together. As this is a DDD question, so I suppose it's about domain services(i.e. not infrastructure services). I suppose that the services should be divided according to their responsibilities in the domain.
update 2 Anyway CarService can do CarService::wash()/ CarService::repaint() / CarService::diagnoseAirConditioningProblems() but it will be strange that CarWashingService will do CarWashingService::diagnoseAirConditioningProblems() it's like in Chomsky's generative grammar - some statements(sentences) in the language make sense, some don't. But if your sentence contains too much subjects(more than say 5-7) it also will be difficult to understand, even if it is valid sentence in language.

Am I allowed to have "incomplete" aggregates in DDD?

DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?

I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.

Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern

You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.

You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.

Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string