How to model relationship using event sourcing

How to model relationship using event sourcing - domain-driven-design

In our scenario, we have a Course entity to represent course content. For each student attending a course, there is a CourseSession entity representing the learning progress of the student in the course. So there is a one-to-many relationship between Course and CourseSession. If using relational database, there will be a course table and course_session table, in which course has a unique ID and course session is uniquely identified by (courseId + studentId). We try to model this using event sourcing, and our event table is like following
-----------------------------------------------------
| entity_type | entity_id | event_type | event_data |
-----------------------------------------------------
this is fine for storing course, there is a courseId we can use as entity_id. But for CourseSession, there isn't an intrinsic id attribute, we have to use the concatenation of (courseId + studentId) as entity_id, which is not quite natural. Is there a better way to model this kind of relationship?

I’m not an expert, so take this answer with a grain of salt
But for CourseSession, there isn't an intrinsic id attribute, we have
to use the concatenation of (courseId + studentId) as entity_id, which
is not quite natural
It's normal to have a composite ID, and sometimes recommended, to keep your domain model aligned with the domain language.
The composite ID can be modeled as Value Object: CourseSessionId { CoursId: string, studentId: string }.
In addition to this domain-specific ID, you may need to add a surrogate ID to the entity to satisfy some infra requirements:
Some ORMs force to have a numeric sequence ID
Some Key-value stores require a ULID Key
Short and user-friendly ID
The surrogate ID is an infra detail and must be hidden as much as possible from the domain layer.
Is there a better way to model this kind of relationship?
The event sourcing pattern I saw in the DDD context suggests having a stream of events per aggregate.
In DDD, an aggregate can be considered as:
A subsystem within the bounded context
It has boundaries and invariants to protect its state
It’s represented by an entity (aggregate root) and can contain other entities and value-objects.
If you consider that CourseSession entity belongs to Course aggregate, then you should keep using course ID as entity_id (or aggregate_id) for both Course and CourseSession related events.
in this case, the write model (main model) can easily build and presents the relationship Course / CourseSessions by playing the Course stream.
Otherwise, you must introduce a read model, and define a projector that will subscribe to both Course and CourseSession streams, and build the needed views.
This read model can be queried directly or by Course and CourseSession aggregates’ commands to take decisions, but keep in mind that’s often eventually consistent, and your business should tolerate that.

Event sourcing is a different way of thinking about data. So the 'old' ways of thinking in terms of relationships don't really translate like that.
So the first point is that an event store isn't a table structure. It is a list of things that have happened in your system. The fact a student spent time on a course is a thing which happened.
If you want/need to access the data in relationships like you describe the easiest thing to do is to create a projection from the events which creates the data in the table form you are looking for.
However, as the projection is not the source of truth, why not think about creating de-normalised tables so you the database won't need to do any joins or other more complex jobs and your data will already be shaped as you need it for use in your application. This leads to super fast highly efficient read models.
Your users will thank you!

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).

But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.

Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

What is the purpose of child entity in Aggregate root?

[ Follow up from this question & comments: Should entity have methods and if so how to prevent them from being called outside aggregate ]
As the title says: i am not clear about what is the actual/precise purpose of entity as a child in aggregate?
According to what i've read on many places, these are the properties of entity that is a child of aggregate:
It has identity local to aggregate
It cannot be accessed directly but through aggregate root only
It should have methods
It should not be exposed from aggregate
In my mind, that translates to several problems:
Entity should be private to aggregate
We need a read only copy Value-Object to expose information from an entity (at least for a repository to be able to read it in order to save to db, for example)
Methods that we have on entity are duplicated on Aggregate (or, vice versa, methods we have to have on Aggregate that handle entity are duplicated on entity)
So, why do we have an entity at all instead of Value Objects only? It seams much more convenient to have only value objects, all methods on aggregate and expose value objects (which we already do copying entity infos).
PS.
I would like to focus to child entity on aggregate, not collections of entities.
[UPDATE in response to Constantin Galbenu answer & comments]
So, effectively, you would have something like this?
public class Aggregate {
...
private _someNestedEntity;
public SomeNestedEntityImmutableState EntityState {
get {
return this._someNestedEntity.getState();
}
}
public ChangeSomethingOnNestedEntity(params) {
this._someNestedEntity.someCommandMethod(params);
}
}

You are thinking about data. Stop that. :) Entities and value objects are not data. They are objects that you can use to model your problem domain. Entities and Value Objects are just a classification of things that naturally arise if you just model a problem.
Entity should be private to aggregate
Yes. Furthermore all state in an object should be private and inaccessible from the outside.
We need a read only copy Value-Object to expose information from an entity (at least for a repository to be able to read it in order to save to db, for example)
No. We don't expose information that is already available. If the information is already available, that means somebody is already responsible for it. So contact that object to do things for you, you don't need the data! This is essentially what the Law of Demeter tells us.
"Repositories" as often implemented do need access to the data, you're right. They are a bad pattern. They are often coupled with ORM, which is even worse in this context, because you lose all control over your data.
Methods that we have on entity are duplicated on Aggregate (or, vice versa, methods we have to have on Aggregate that handle entity are duplicated on entity)
The trick is, you don't have to. Every object (class) you create is there for a reason. As described previously to create an additional abstraction, model a part of the domain. If you do that, an "aggregate" object, that exist on a higher level of abstraction will never want to offer the same methods as objects below. That would mean that there is no abstraction whatsoever.
This use-case only arises when creating data-oriented objects that do little else than holding data. Obviously you would wonder how you could do anything with these if you can't get the data out. It is however a good indicator that your design is not yet complete.

Entity should be private to aggregate
Yes. And I do not think it is a problem. Continue reading to understand why.
We need a read only copy Value-Object to expose information from an entity (at least for a repository to be able to read it in order to
save to db, for example)
No. Make your aggregates return the data that needs to be persisted and/or need to be raised in a event on every method of the aggregate.
Raw example. Real world would need more finegrained response and maybe performMove function need to use the output of game.performMove to build propper structures for persistence and eventPublisher:
public void performMove(String gameId, String playerId, Move move) {
Game game = this.gameRepository.load(gameId); //Game is the AR
List<event> events = game.performMove(playerId, move); //Do something
persistence.apply(events) //events contains ID's of entities so the persistence is able to apply the event and save changes usign the ID's and changed data wich comes in the event too.
this.eventPublisher.publish(events); //notify that something happens to the rest of the system
}
Do the same with inner entities. Let the entity return the data that changed because its method call, including its ID, capture this data in the AR and build propper output for persistence and eventPublisher. This way you do not need even to expose public readonly property with entity ID to the AR and the AR neither about its internal data to the application service. This is the way to get rid of Getter/Setters bag objects.
Methods that we have on entity are duplicated on Aggregate (or, vice versa, methods we have to have on Aggregate that handle entity
are duplicated on entity)
Sometimes the business rules, to check and apply, belongs exclusively to one entity and its internal state and AR just act as gateway. It is Ok but if you find this patter too much then it is a sign about wrong AR design. Maybe the inner entity should be the AR instead a inner entity, maybe you need to split the AR into serveral AR's (inand one the them is the old ner entity), etc... Do not be affraid about having classes that just have one or two methods.
In response of dee zg comments:
What does persistance.apply(events) precisely do? does it save whole
aggregate or entities only?
Neither. Aggregates and entities are domain concepts, not persistence concepts; you can have document store, column store, relational, etc that does not need to match 1 to 1 your domain concepts. You do not read Aggregates and entities from persitence; you build aggregates and entities in memory with data readed from persistence. The aggregate itself does not need to be persisted, this is just a possible implementation detail. Remember that the aggregate is just a construct to organize business rules, it's not a meant to be a representation of state.
Your events have context (user intents) and the data that have been changed (along with the ID's needed to identify things in persistence) so it is incredible easy to write an apply function in the persistence layer that knows, i.e. what sql instruction in case of relational DB, what to execute in order to apply the event and persist the changes.
Could you please provide example when&why its better (or even
inevitable?) to use child entity instead of separate AR referenced by
its Id as value object?
Why do you design and model a class with state and behaviour?
To abstract, encapsulate, reuse, etc. Basic SOLID design. If the entity has everything needed to ensure domain rules and invariants for a operation then the entity is the AR for that operation. If you need extra domain rules checkings that can not be done by the entity (i.e. the entity does not have enough inner state to accomplish the check or does not naturaly fit into the entity and what represents) then you have to redesign; some times could be to model an aggregate that does the extra domain rules checkings and delegate the other domain rules checking to the inner entity, some times could be change the entity to include the new things. It is too domain context dependant so I can not say that there is a fixed redesign strategy.
Keep in mind that you do not model aggregates and entities in your code. You model just classes with behaviour to check domain rules and the state needed to do that checkings and response whith the changes. These classes can act as aggregates or entities for different operations. These terms are used just to help to comunicate and understand the role of the class on each operation context. Of course, you can be in the situation that the operation does not fit into a entity and you could model an aggregate with a V.O. persistence ID and it is OK (sadly, in DDD, without knowing domain context almost everything is OK by default).
Do you wanna some more enlightment from someone that explains things much better than me? (not being native english speaker is a handicap for theese complex issues) Take a look here:
https://blog.sapiensworks.com/post/2016/07/14/DDD-Aggregate-Decoded-1
http://blog.sapiensworks.com/post/2016/07/14/DDD-Aggregate-Decoded-2
http://blog.sapiensworks.com/post/2016/07/14/DDD-Aggregate-Decoded-3

It has identity local to aggregate
In a logical sense, probably, but concretely implementing this with the persistence means we have is often unnecessarily complex.
We need a read only copy Value-Object to expose information from an
entity (at least for a repository to be able to read it in order to
save to db, for example)
Not necessarily, you could have read-only entities for instance.
The repository part of the problem was already addressed in another question. Reads aren't an issue, and there are multiple techniques to prevent write access from the outside world but still allow the persistence layer to populate an entity directly or indirectly.
So, why do we have an entity at all instead of Value Objects only?
You might be somewhat hastily putting concerns in the same basket which really are slightly different
Encapsulation of operations
Aggregate level invariant enforcement
Read access
Write access
Entity or VO data integrity
Just because Value Objects are best made immutable and don't enforce aggregate-level invariants (they do enforce their own data integrity though) doesn't mean Entities can't have a fine-tuned combination of some of the same characteristics.

These questions that you have do not exist in a CQRS architecture, where the Write model (the Aggregate) is different from a Read model. In a flat architecture, the Aggregate must expose read/query methods, otherwise it would be pointless.
Entity should be private to aggregate
Yes, in this way you are clearly expressing the fact that they are not for external use.
We need a read only copy Value-Object to expose information from an entity (at least for a repository to be able to read it in order to save to db, for example)
The Repositories are a special case and should not be see in the same way as Application/Presentation code. They could be part of the same package/module, in other words they should be able to access the nested entities.
The entities can be viewed/implemented as object with an immutable ID and a Value object representing its state, something like this (in pseudocode):
class SomeNestedEntity
{
private readonly ID;
private SomeNestedEntityImmutableState state;
public getState(){ return state; }
public someCommandMethod(){ state = state.mutateSomehow(); }
}
So you see? You could safely return the state of the nested entity, as it is immutable. There would be some problem with the Law of Demeter but this is a decision that you would have to make; if you break it by returning the state you make the code simpler to write for the first time but the coupling increases.
Methods that we have on entity are duplicated on Aggregate (or, vice versa, methods we have to have on Aggregate that handle entity are duplicated on entity)
Yes, this protect the Aggregate's encapsulation and also permits the Aggregate to protect it's invariants.

I won't write too much. Just an example. A car and a gear. The car is the aggregate root. The gear is a child entity

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.

For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.

I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

Aggregate roots and repository in DDD

I just started on DDD and encounter the term aggregate roots.
My current understanding is that this is kind of a parent entity that hold reference to other complementary entity. Example : aggregate roots will be Employee that also contain position, shift, gender, and salary.
My first question will be whether this understanding is correct ?
Secondly, I get an impression that repository is defined only for each aggregate. Yet, it puzzles me how we could retrieve information regarding other entity (Ex: list of positions or shift type) ?
Thank you,

Aggregates are consistency boundaries to enforce invariants. This means that the entities and objects inside the aggregate must remain consistent together with regards to the business rules.
http://dddcommunity.org/library/vernon_2011/
http://martinfowler.com/bliki/DDD_Aggregate.html
https://lostechies.com/gabrielschenker/2015/05/25/ddd-the-aggregate/
Secondly, I get an impression that repository is defined only for each aggregate. Yet, it is puzzle me how we could retrieve information regarding other entity (Ex : list of positions or shift type) ?
You can have a separate read model over your data if you choose to do so and it makes sense that the business wants to view the data in a different way. The consistencies you need to enforce when you are writing data do not apply on the read side. CQRS is the pattern to help with this - you separate your write side from your read side.
https://lostechies.com/gabrielschenker/2015/04/07/cqrs-revisited

CQRS & event sourcing can I use an auto incremented INT as the aggregate ID?

I am working on a legacy project and trying to introduce CQRS in some places where it's appropriate. In order to integrate with all of the legacy which is relational I would like to project my aggregate (or part of it) into a table in the relational database.
I would also like the aggregate ID to be the auto-incremented value on that projected table. I know this seems like going against the grain since it's mixing the read model with the write model. However I don't want to pollute the legacy schema with foreign key GUUIDs.
Would this be a complete no-no, and if so what would you suggest?
Edit: Maybe I could just store the GUUID in the projected table, that way when the events get projected I can identify the row to update, but then still have an auto incremented column for joining on?

There is nothing wrong with using an id created by the infrastructure layer for your entities. This pattern is commonly used in Vaughn Vernon's 'Implementing DDD' book:
Get the next available ID from the repository.
Create an entity.
Save the entity in the repository.
Your problem is that you want to use an id created in another Bounded Context. That is a huge and complete no-no, not the fact that the id is created by the Infrastructure Layer.
You should create the id in your Bounded Context and use it to reference the aggregate from other Contexts (just as you wrote when you edited your question).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string