Should aggregate model contain metadata? - domain-driven-design

I would like to clarify, how the model of an aggregate should look like.
I have couple of events, which contains data which won't be ever used for validation.
For example, metadata like user_id who triggered the action (auditing), correlation_id (observability), labels / flags.
They will be received within the command and will be sent out as the property of the event. It won't be lost as each event is persisted. That's clear.
But should the aggregate object contain these values?
Projection will have them and will display them. Having them in Aggregate does not make sense in my opinion.
Or, it does. In case you want to create the snapshot, you need properties of all events.
Thanks for you advice.

Aggregates should contain only as much information as is required to maintain consistency. If your business rules require user_id, then that information should be persisted in the aggregate. Otherwise, it should not.

Related

Axon Framework: send command on aggregate load

We're building a microservices system with Axon Framework 4.1. In our domain, we have a label concept where we can attach labels to other entities. While labels are normally created and managed by the user, some of these labels are "special" and need to be hard-coded, but they need to be present in the event stream as well.
We have a bunch of aggregates that represent entities that can be labeled with these labels. Some of these aggregates will be used frequently, while others might be used infrequently or are even abandoned by the user.
Sometimes we come up with new special labels. We add them to the code, and then we also need to add them to the event stream. What is a good way to do that?
We can create a special command that we need to send when the updated service is started for the first time. It goes through all the labels and adds the ones that aren't in the event stream yet. This has two disadvantages. First, we need to actually send that command, which either requires us to not forget it, or to add some infrastructure for it outside of the code (e.g., in our build pipeline). Also, other services could have booted up faster with the new labels and started sending commands before we fired our special command. The other disadvantage is that this command will target all aggregates, including the abandoned ones, which could be wasteful of resources and be confusing to end users who might see activity in a document they thought was abandoned.
Ideally, we would like to be able to send the command when Axon has just loaded the aggregate. That way we would be certain that the labels are only introduced in aggregates that are actually used. Also, we could wire this up in code and it wouldn't require us to add infrastructure outside of the application and/or remember to do it manually.
Unfortunately, this feature doesn't seem to exist in Axon (yet) 😉.
Are there other (better) ways to achieve this?
I've got an idea which might help you out on this.
If I understand the use case correctly, the "Label" in your system, which user can introduce themselves but for which also a couple of hard-coded versions exist, is an Aggregate.
Based on that assumption, I suggest to be smart with the Aggregate Identifier you are using.
The sole thing that Axon expects from you, is that the Aggregate Identifier is (or can be made in to) a String. Typically a UUID is used for the Aggregate Identifiers, which is a reasonable first start.
You can however wrap this UUID in a typed-id object. Taking your "Label" Aggregate, that would opt for a LabelId.
That said, let's first go back to verifying whether a given "Label" Aggregate exists within the Event Stream.
The concern you have is rather valid I think; reading the entire Event Stream to figure out whether a given Aggregate instance exists is to big of a hassle.
However, the EventStore can be queried through two mechanism:
The Event Stream from a given point in time (e.g. what the TrackingToken mechanism does).
The Event Stream for a given Aggregate instance, based on the Aggregate Identifier.
It's the second option which is far more ideal in your scenario.
Just query the EventStore for a given "Label" Aggregate's Identifier. If you receive a non-empty Event Stream, you know it already exists.
Vice versa, if no Events are found, you are certain it's a new "Label" that needs to be introduced.
The crux here is in knowing the "Label's" Aggregate Identifier up front, which circles back to the String storage approach for the Aggregate Identifiers using a typed LabelId. What you could do, is deviate in the LabelId object between a custom "Label" (I'd opt for a UUID here) and a hard-coded "Label".
For the latter, you could for example have the label-name, plus a UUID/counter if desired.
Doing so will ensure that all the Events published from a hard-coded "Label" will have an Aggregate Identifier you can anticipate on during start-up.
Hope this is clear and all, if not, please comment on my response below.

How to model an entity's current status in DDD

I am trying to get to grips with the ideas behind DDD and apply them to a pet project we have, and I am having some questions that I hope that someone here would be able to answer.
The project is a document management system. The particular problem we have regards two notions that our system handles: That of a Document and that of a DocumentStatus.
A Document has a number of properties (such as title, author, etc). Users can change any of the Document's properties through out its life time.
A Document may be, at any time, be at a particular state such as NEW, UNDER_REVISION, REVISED, APPROVED etc. For each state we need to know who made that change to that state.
We need to be able to query the system based on a document status. An example query would be "Get me all documents that are in the REVISED state".
"Get me all documents whose status has been changed by user X"
The only time that a Document and a DocumentStatus need to be changed in the same transaction is when the Document is created (create the document and at the same time assign it a status of NEW).
For all other times, the UI allows the update of either but not both (i.e. you may change a document's property such as the author, but not its state.) Or you can update its state (from NEW to UNDER_REVISION) but not its properties.
I think we are safe to consider that a Document is an Entity and an Aggregate Root.
We are buffled about what DocumentStatus is. One option is to make it a Value Object part of the Document's aggregate.
The other option is to make it an Entity and be the root of its own aggregate.
We would also liked to mention that we considered CQRS as described in various DDD documents, but we think it is too much of a hassle, especially given the fact that we need to perform queries on the DocumentStatus.
Any pointers or ideas would be welcomed.
Domain
You say you need to be able to see past status changes, so the status history becomes a domain concept. A simple solution would then be the following:
Define a StatusHistory within the Document entity.
The StatusHistory is a list of StatusUpdate value objects.
The first element in the StatusHistory always reflects the current state - make sure you add the initial state as StatusUpdate value object when creating Document entities.
Depending on how much additional logic you need for the status history, consider creating a dedicated value object (or even entity) for the history itself.
Persistence
You don't really say how your persistence layer looks like, but I think creating queries against the first element of the StatusHistory list should be possible with every persistence mechanism. With a map-reduce data store, for example, create a view that is indexed by Document.StatusHistory[0] and use that view to realize the queries you need.
If you were only to record the current status, then that could well be a value object.
Since you're composing more qualifying - if not identifying - data into it, for which you also intend to query, then that sounds to me as if no DocumentStatus is like another, so a value object doesn't make much sense, does it?
It is identified by
the document
the author
the time it occurred
Furthermore, it makes even more sense in the context of the previous DocumentStatus (if you consider more states than just NEW and UNDER_REVISION).
To me, this clearly rules out modeling DocumentStatus as a value object.
In terms of the state as a property of DocumentStatus, and following the notion of everything is an object (currently reading David West's Object Thinking), then that could of course be modeled as a value object.
Follows How to model an entity's current status in DDD.

Can an aggregate be part of a domain-event?

Consider an aggregate with many properties. For example UserGroup. If I want to publish a UserGroupCreatedEvent I can do 2 things:
duplicate the properties from the just created UserGroup to
UserGroupCreatedEvent and copy their values. OR:
Refer to the new UserGroup within the UserGroupCreatedEvent
In many examples, like Axon's Contacts App, I've seen property duplication. I wonder why, and if in real-world CQRS applications, this is not a lot of overhead, and developers choose to refer the aggregate instead.
An event is a DTO and it's meant to cross boundaries. Including the aggregate directly has the following problems:
The aggregate is a concept that makes sense only in its own bounded context;
An event should contain only the relevant changes, not the whole state of the concept;
Because an event is a DTO, at one point it will be (de)serialized and that would be a technical problem with properly encapsulated objects;
Every component/context receiving handling the event would have a dependency of the component where the aggregate is defined;
These are the main reasons why a Domain Event should be just a flattened representation of the relevant state changes.
P.S: If you need to include the whole state in your event, maybe the events is improperly designed or you're dealing with a simple data structure. Usually an aggregate contains some value objects and/or encapsulates some business constraints.
An important property of domain events is that they are immutable. Bearing that in mind, the two possibilities you mention differ greatly:
Duplicating properties records their value at the time the UserGroup was created.
Referencing the UserGroup by ID just tells you that a UserGroup was created, but not the properties it had at the time. If the UserGroup has been deleted in the meantime, this means that the information is lost.
Which properties you copy depends on just that difference. Do you need to be able to look up e.g. the name of a UserGroup at its creation time? Add it as a property. If not (and if it's not expected that it ever will be required), don't.
Also, domain events have a global scope (i.e. they are meaningful outside of your BC), so you should include all information that clients outside of your BC need to make sense of the domain event.
Note that attaching the whole aggregate root object to a domain event violates the immutability rule of domain events, so this is most probably a bad idea.

Aggreate Root, Aggregates, Entities, Value Objects

I'm struggling with some implementation details when looking at the terms mentioned in the title above.
Can someone tell me whether my interpretation is right?
For reference I look at a CRM Domain
As a AggregateRoot I could see a Customer.
It may have Entities like Address which contains street, postal code and so on.
Now there is something like Contact and Activity this should be at least aggregates. Right? Now if the Contacts and Activities would have complex business logic. For example, "Every time a contact of the type order is created, the order workflow should be started"
Would then Contact need to be an Aggregate root? What may be implementation implications that could result from this?
Further more when looking and Event Sourcing, Would each Aggregate have its own Stream? In this scenario A Customer could have thousands of activities.
It would be great if someone could guide em in which part my understanding is right and which I differ form the common interpretation.
What do you mean by “at least aggregates”?
An aggregate is a set of one or more connected entities. The aggregate can only be accessed from its root entity, also called the aggregate root. The aggregate defines the transactional boundaries for the entities which must be preserved at all time. Jimmy Bogard has a good explanation of aggregates here.
When using event sourcing each aggregate should have its own stream. The stream is used to construct the aggregates and there is no reason to let several aggregates use the same stream.
You should try to keep your aggregates small. If you expect your customer object to have thousands of activities then you should look at if it is possible to design the activities as a separate aggregate, just as long as its boundaries ensures that you do not leave the system in an invalid state.

How should I enforce relationships and constraints between aggregate roots?

I have a couple questions regarding the relationship between references between two aggregate roots in a DDD model. Refer to the typical Customer/Order model diagrammed below.
First, should references between the actual object implementation of aggregates always be done through ID values and not object references? For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct? I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots? For example say I want all the associated orders to be deleted when a customer is deleted. The ICustomerRepository.DeleteCustomer() method should not be referencing the IOrderRepostiory should it? That seems like that would be breaking the boundaries between the aggregates/repositories? Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
First, should references between aggregates always be done through ID values and not actual object references?
Not really - though some would make that change for performance reasons.
For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct?
Generally, you'd model 1 side of the relationship (eg., Customer.Orders or Order.Customer) for traversal. The other can be fetched from the appropriate Repository (eg., CustomerRepository.GetCustomerFor(Order) or OrderRepository.GetOrdersFor(Customer)).
Wouldn't that mean that the OrderRepository would have to know something about how to create a Customer? Wouldn't that be beyond what OrderRepository should be responsible for...
The OrderRepository would know how to use an ICustomerRepository.FindById(int). You can inject the ICustomerRepository. Some may be uncomfortable with that, and choose to put it into a service layer - but I think that's overkill. There's no particular reason repositories can't know about and use each other.
I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Aggregate roots are allowed to hold references to other aggregate roots. In fact, anything is allowed to hold a reference to an aggregate root. An aggregate root cannot hold a reference to a non-aggregate root entity that doesn't belong to it, though.
Eg., Customer cannot hold a reference to OrderLines - since OrderLines properly belongs as an entity on the Order aggregate root.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots?
If (and I stress if, because it's a peculiar requirement) that's actually a use case, it's an indication that Customer should be your sole aggregate root. In most real-world systems, however, we wouldn't actually delete a Customer that has associated Orders - we may deactivate them, move their Orders to a merged Customer, etc. - but not out and out delete the Orders.
That being said, while I don't think it's pure-DDD, most folks will allow some leniency in following a unit of work pattern where you delete the Orders and then the Customer (which would fail if Orders still existed). You could even have the CustomerRepository do the work, if you like (though I'd prefer to make it more explicit myself). It's also acceptable to allow the orphaned Orders to be cleaned up later (or not). The use case makes all the difference here.
Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
I probably wouldn't go a service route for something so intimately tied to the repository. As for how to make sure a service is used...you just don't put a public Delete on the CustomerRepository. Or, you throw an error if deleting a Customer would leave orphaned Orders.
Another option would be to have a ValueObject describing the association between the Order and the Customer ARs, VO which will contain the CustomerId and additional information you might need - name,address etc (something like ClientInfo or CustomerData).
This has several advantages:
Your ARs are decoupled - and now can be partitioned, stored as event streams etc.
In the Order ARs you usually need to keep the information you had about the customer at the time of the order creation and not reflect on it any future changes made to the customer.
In almost all the cases the information in the value object will be enough to perform the read operations ( display customer info with the order ).
To handle the Deletion/deactivation of a Customer you have the freedom to chose any behavior you like. You can use DomainEvents and publish a CustomerDeleted event for which you can have a handler that moves the Orders to an archive, or deletes them or whatever you need. You can also perform more than one operation on that event.
If for whatever reason DomainEvents are not your choice you can have the Delete operation implemented as a service operation and not as a repository operation and use a UOW to perform the operations on both ARs.
I have seen a lot of problems like this when trying to do DDD and i think that the source of the problems is that developers/modelers have a tendency to think in DB terms. You ( we :) ) have a natural tendency to remove redundancy and normalize the domain model. Once you get over it and allow your model to evolve and implicate the domain expert(s) in it's evolution you will see that it's not that complicated and it's quite natural.
UPDATE: and a similar VO - OrderInfo can be placed inside the Customer AR if needed, with only the needed information - order total, order items count etc.

Resources