Read model with data from multiple aggregate roots (different contexts) - domain-driven-design

I'm curious to how to join data from multiple aggregate roots in an read model for an event sourced aggregate root. Can try to take a simple example:
If I have an aggregate root called Cart which supports following events in it's event stream (properties in parentheses - keep in mind this is an simple example):
AddProductToCart(cartId: Int, productId: Int)
RemoveProductFromCart(cartId: Int, productId: Int)
AddUserLicenseToProduct(cartId: Int, productId: Int, userId: Int)
RemoveUserLicenseFromProduct(cartId: Int, productId: Int, userId: Int)
EmptyCart(cartId: Int)
It's ok when projecting read models with data coming from this event stream. I can for example project a cart object which looks something like this:
Cart(cartId: Int, products: List[Product])
Product(productId: Int, userLicenses: List[UserLicense])
UserLicense(userId: Int)
But how does one join data from another aggregate root in another context into this cart projection. For example if I wanted to extend the read model with data from the Product aggregate root which lives in another context. Let's say I would like to extend it with productName and productType.
Take into consideration we are working in an distributed system, where Product and Cart would be live in different services/applications.
I suppose one solution would be to include the data in the commands and events. But that doesn't seem to scale very well if one would have larger read models with data from multiple aggregate roots. Also one has to be able to nuke and rebuild the read model.
I suppose another solution would be to duplicate data from other aggregate roots into the storage of other applications/services/contexts. For example duplicate the productName and productType data into storage owned be the Cart application, but not have it be a part of the Cart event stream. The Cart application would then have to listen to events (e.g. ProductCreated, ProductNameChanged) to keep data updated. I guess this might be a viable solution.

Each bounded context should be loosely coupled. We had a similar issue with two of our contexts. The solution that we found was to use workflows by creating all the communication between contexts in those files. In which we could synchronize the required schemas by subscribing to an event handler. As we used Elixir, the library that we have been used is Commanded, which have its own Event Bus.
But in a distributed systems you can use Apache Kafka. At the end of the day, I think that the easier solution should keep your schemas the cleanest possible (it also going to help you to respect the GDPR compliance) and to manage all your communication through a separate layer by an event handler.
To see this solution in a "real-life" way, I can recommend you a great example repository built with Elixir.
https://leanpub.com/buildingconduit/read

This question also comes up with event-driven architectures and not only event-sourcing. I reckon that you've covered most options in terms of capturing the relevant data from the producer of the event.
Another option would be that an event contain as little data as possible from the related bounded context. At a minimum that would be an identifier. However, in most cases some of the data should be denormalized to make sense. For instance, having the product description denormalized into the Cart and eventual Order would be helpful especially when someone changes the description after I have made my choice. The description may change from Blue pen to Red pen which would drastically alter what I intended to purchase. In this case the Product in your Shopping BC may be represented by a value object that contains the Id along with the Description.
If you now would like to augment the read-only data we are left only with the option of retrieving it from the source BC. This can be done in the read-model using some API (Rest/ACL) and then the data saved. To make it more fault tolerant one may opt for a messaging/service bus infrastructure to handle the retrieval of the additional data and updating of the relevant read-model record.

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).
But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.
Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

How to model relationship using event sourcing

In our scenario, we have a Course entity to represent course content. For each student attending a course, there is a CourseSession entity representing the learning progress of the student in the course. So there is a one-to-many relationship between Course and CourseSession. If using relational database, there will be a course table and course_session table, in which course has a unique ID and course session is uniquely identified by (courseId + studentId). We try to model this using event sourcing, and our event table is like following
-----------------------------------------------------
| entity_type | entity_id | event_type | event_data |
-----------------------------------------------------
this is fine for storing course, there is a courseId we can use as entity_id. But for CourseSession, there isn't an intrinsic id attribute, we have to use the concatenation of (courseId + studentId) as entity_id, which is not quite natural. Is there a better way to model this kind of relationship?
I’m not an expert, so take this answer with a grain of salt
But for CourseSession, there isn't an intrinsic id attribute, we have
to use the concatenation of (courseId + studentId) as entity_id, which
is not quite natural
It's normal to have a composite ID, and sometimes recommended, to keep your domain model aligned with the domain language.
The composite ID can be modeled as Value Object: CourseSessionId { CoursId: string, studentId: string }.
In addition to this domain-specific ID, you may need to add a surrogate ID to the entity to satisfy some infra requirements:
Some ORMs force to have a numeric sequence ID
Some Key-value stores require a ULID Key
Short and user-friendly ID
The surrogate ID is an infra detail and must be hidden as much as possible from the domain layer.
Is there a better way to model this kind of relationship?
The event sourcing pattern I saw in the DDD context suggests having a stream of events per aggregate.
In DDD, an aggregate can be considered as:
A subsystem within the bounded context
It has boundaries and invariants to protect its state
It’s represented by an entity (aggregate root) and can contain other entities and value-objects.
If you consider that CourseSession entity belongs to Course aggregate, then you should keep using course ID as entity_id (or aggregate_id) for both Course and CourseSession related events.
in this case, the write model (main model) can easily build and presents the relationship Course / CourseSessions by playing the Course stream.
Otherwise, you must introduce a read model, and define a projector that will subscribe to both Course and CourseSession streams, and build the needed views.
This read model can be queried directly or by Course and CourseSession aggregates’ commands to take decisions, but keep in mind that’s often eventually consistent, and your business should tolerate that.
Event sourcing is a different way of thinking about data. So the 'old' ways of thinking in terms of relationships don't really translate like that.
So the first point is that an event store isn't a table structure. It is a list of things that have happened in your system. The fact a student spent time on a course is a thing which happened.
If you want/need to access the data in relationships like you describe the easiest thing to do is to create a projection from the events which creates the data in the table form you are looking for.
However, as the projection is not the source of truth, why not think about creating de-normalised tables so you the database won't need to do any joins or other more complex jobs and your data will already be shaped as you need it for use in your application. This leads to super fast highly efficient read models.
Your users will thank you!

Stream aggregate relationship in an event sourced system

So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates. For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on. Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store
The event store in a DDD project is designed around event-sourced Aggregates:
it provides the efficient loading of all events previously emitted by an Aggregate root instance (having a given, specified ID)
those events must be retrieved in the order they where emitted
it must not permit appending events at the same time for the same Aggregate root instance
all events emitted as result of a single command must be all appended atomically; this means that they should all succeed or all fail
The 4th point could be implemented using transactions but this is not a necessity. In fact, for scalability reasons, if you can then you should choose a persistence that provides you atomicity without the use of transactions. For example, you could store the events in a MongoDB document, as MongoDB guaranties document-level atomicity.
The 3rd point can be implemented using optimistic locking, using a version column with an unique index per (version x AggregateType x AggregateId).
At the same time, there is a DDD rule regarding the Aggregates: don't mutate more than one Aggregate per transaction. This rule helps you A LOT to design a scalable system. Break it if you don't need one.
So, the solution to all these requirements is something that is called an Event-stream, that contains all the previous emitted events by an Aggregate instance.
So I would have an Inventory aggregate
The DDD has higher precedence than the Event-store. So, if you have some business rules that force you to decide that you must have a (big) Inventory aggregate, then yes, it would load ALL the previous events generated by itself. Then the InventoryItem would be a nested entity that cannot emit events by itself.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Yes, indeed. The simplest thing would be for us to all have a single Aggregate, with a single instance. Then the consistency would be the strongest possible. But this is not efficient so you need to better think about the real business requirements.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location?
There is another possibility. You should model the assigning of product codes as a Business Process. For this you could use a Saga/Process manager that would orchestrate the entire process. This Saga could use a collection with an unique index added to the product code column in order to ensure that only one product uses a given product code.
You could design the Saga to permit the allocation of an already-taken code to a product and to compensate later or to reject the invalid allocation in the first place.
It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
The Saga uses indeed a private state maintained from the domain events in an eventual consistent state, just like a Read-model but this does not feel wrong for me. It may use whatever it needs in order to bring (eventually) the system as a hole to a consistent state. It complements the Aggregates, whose purpose is to not allow the building-blocks of the system to get into an invalid state.

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

CQRS & event sourcing can I use an auto incremented INT as the aggregate ID?

I am working on a legacy project and trying to introduce CQRS in some places where it's appropriate. In order to integrate with all of the legacy which is relational I would like to project my aggregate (or part of it) into a table in the relational database.
I would also like the aggregate ID to be the auto-incremented value on that projected table. I know this seems like going against the grain since it's mixing the read model with the write model. However I don't want to pollute the legacy schema with foreign key GUUIDs.
Would this be a complete no-no, and if so what would you suggest?
Edit: Maybe I could just store the GUUID in the projected table, that way when the events get projected I can identify the row to update, but then still have an auto incremented column for joining on?
There is nothing wrong with using an id created by the infrastructure layer for your entities. This pattern is commonly used in Vaughn Vernon's 'Implementing DDD' book:
Get the next available ID from the repository.
Create an entity.
Save the entity in the repository.
Your problem is that you want to use an id created in another Bounded Context. That is a huge and complete no-no, not the fact that the id is created by the Infrastructure Layer.
You should create the id in your Bounded Context and use it to reference the aggregate from other Contexts (just as you wrote when you edited your question).

Resources