Inter-Aggregate Communication in CQRS + DDD + Event Sourcing - domain-driven-design

How should separate aggregate roots (AR) communicate with one another in an environment built on DDD principles using an event-sourced aggregate back-end?
For instance, I have a Facility aggregate root (AR) which has a factory method responsible for creating a Booking AR. The Booking is a time-sensitive combination of a Person AR and a Facility AR. A Person can only be booked in a single Facility.
In DDD, I would have held references to the Booking in Person, and Person in Facility. However, when generating events for use in event-sourcing I think that trying to handle the event deserialization from the back-end would become prohibitive. Therefore, I've taken to only holding references to the value object-based unique id's. This brings up a new problem, however, when a method on an AR needs to call another method on another AR -- how do you handle that situation? Hit the event source repository from the domain AR?
What is the general use case in this scenario? Am I approaching this all wrong?

Aggregate Root boundaries define a consistency boundary.
Inside the aggregate, consistency is guaranteed.
Outside... it's not.
So you should not have operations that spans several aggregates and have to be consistent.
If you need a transaction that spans two aggregates, you should review your aggregate boundaries.
For things that happen outside the aggregate you should have an event handler that will send a command to other aggregates.
If the logic of actions between aggregates is more complicated, you can define a process, a state machine that will listen to events and send commands to aggregates.
Processes can be used to define long running transactions (with compensation instead of rollback), or take business decisions based on what's happening in the system at a large scale (even between bounded contexts).

When using Event Sourcing and CQRS the most elegant (at least in my opinion) way of inter-AR communication is messaging. You can look at Ncqrs project (it will be easier if you are a .NET guy), particularly 'Messaging' branch. The idea is, ARs implement IMessageHandler interface for every message type they handle and AR base class exposes method Send for sending there messages. By means of this API clients can invoke model behavior and model itself can communicate (between ARs).

Related

DDD- Mapping events from external Bounded Context to domain model

My team is building a new microservice leveraging techniques from Domain-Driven Design and Event Sourcing. This service has to integrate with a handful of external bounded contexts (BCs) in the form of other legacy services. We've identified our core domain model, and its clear that we need some sort of Anti-corruption layer (ACL) between the external BCs and our internal domain model. However, we're getting hung up on some of the technical details of how best to accomplish this.
Let's say our domain has an Asset aggregate which can represent some piece of equipment from several of these external BCs. Once of these BCs, let's say BC-A, sends out EquipmentUpdate events on a message broker that our application can subscribe to. These events carry no intent - they merely reflect that the state of the external entity has changed in some way and it's up to us to determine what actually changed. The ID in the events is also from the external BC, not our domain.
So our ACL has to do the following tasks:
Map the external identifier from BC-A to our internal aggregate
Perform a diff of the new event and current/previous state to figure out what actually changed
Do this in a way that is resilient against out-of-order and duplicate event messages
Option 1 - Query directly from repository
First option is to use the external identifier directly in our repository to fetch the matching Asset. This seems like the simplest option, however it feels wrong since it's leaking concepts from external BCs into our repository API.
Furthermore, it forces us to to store external identifiers and event version directly on our aggregate which also feels like it defeats the purpose of the ACL.
interface AssetRepository {
Optional<Asset> findAssetByExternalIdentifier(String externalId);
}
Option 2 - Dedicated mapping
Second option is to expose a dedicated query that uses the external identifier to query the matching internal identifier, as well as the latest version that was processed.
This feels cleaner, but requires an additional read from the database where option 1 was a single read.
data class AssetMappingQueryResult {
String AssetId;
Long LatestVersion;
}
interface AssetMappingQuery {
Optional<AssetMappingQueryResult> resolveFromExternalIdentifier(String externalId);
}
How are other teams doing this?
I would tend to treat the EquipmentUpdate message as a signal that something might have changed. On receipt of such a message, the ACL queries BC-A for the latest state for the associated IDs, compares that state with the state it received the last time, and emits commands corresponding to the state changes which are of interest to the bounded context you're developing. In the case of duplicate messages (where the second event conveys no state change), this approach is idempotent. The confluence of "select current" likewise makes out-of-order not a concern. The ACL may want to guard against concurrent modifications involving the same ID, though viewing its output as commands against your BC's write model might make that unnecessary (especially depending on the chance that one of the concurrent modifications might be slow). The specific techniques for that vary, my personal preference coming from the Akka world would be to have responsibility for a given ID assigned to an actor.
An ACL is by its nature somewhat outside of any bounded context: it's analogous to the space between customs/immigration outposts on the border between countries. One could say it's partially in both (it's also, from a CQRS standpoint, a read-model for the "other" bounded context, though it may be a "read-model once removed", given that it should probably query a read-model in the source bounded context) bounded contexts. Alternatively, one could call an ACL a miniature bounded context which incorporates knowledge of parts of two other BCs: this may even extend to having its own aggregates, repositories, etc.

One aggregate per transaction, with "one" or "multiple" bounded contexts

Following the Vaughn Vernon recommendation, to achieve a high level of decoupling and single responsibility, just one aggregate should be changed per transaction.
In the chapter 8 of the Red Book Vaughn Vernon demonstrated how two aggregates can "talk" to each other with domain events. In the chapter 13 how different aggregates in two different bounded context can "talk" to each other with notifications.
My question is, why should I deal with these situations differently once both of them happen in different transaction? If is it just one or multiple bounded contexts the possible problems wouldn't be the same?
For example, if the application crashes between two domain events in the same bounded context I'll end up with inconsistency as with two bounded contexts.
It seems that the safest way to deal with two aggregates "talking" to each other asynchronously is to have a transitional status in it, persist the events before send them (to avoid lose events), have idempotent operations when possible and deduplicate the event in the receiving side when it's not possible to execute the operation in an idempotent way.
I see two aspects to consider in your question:
The DDD aspect: Event types and what you do with them
A technical aspect: how to implement it reliably
Regarding the types of Events what I would say is that events that stay within the boundaries of a bounded context (often called Domain Events) normally carry a lot of information. Potentially a big part of the state of the Aggregate. If you use CQRS, they are used to create the Read Model. Events that cross the BC boundaries are sometimes called Integration Events and they should carry as little data as possible (potentially, only global IDs, like CustomerId, OrderId). The reason is that every extra property that you add is extra coupling between the publisher BC and the subscriber BCs, which is what you want to minimize.
I would say that it's this distinction between the types of Events which might lead to have different technical solutions, but I agree with you that it doesn't have to be this way if you find a solution that works well for both cases.
The solution you propose is correct. It looks very similar to the Outbox feature of NServiceBus, which basically takes care of all this for you.
Another approach that I've used, if your message broker supports it, is what Azure Service Bus calls Send Via. With this feature, you can publish events Via your own queue but the send will be committed transactionally with the removal of the incoming message from the queue. This means that if for some reason the message that you are processing is not deleted from the queue successfully (DB update exception, broker unavailable, etc) and therefore it will be retried, you know for sure that the events won't be sent and you can safely publish them again during the retry. This makes making idempotent operations simpler and avoids publishing ghost messages.

How to use sagas in a CQRS architecture using DDD?

I am designing a CQRS application using DDD, and am wondering how to implement the following scenario:
a Participant aggregate can be referenced by multiple ParticipantEntry aggregates
an AddParticipantInfoCommand is issued to the Command side, which contains all info of the Participant and one ParticipantEntry (similar to an Order and one OrderLineItem)
Where should the logic be implemented that checks whether the Participant already exists and if it doesn't exist, creates the Participant?
Should it be done in a Saga that first checks the domain model for the existence of the Participant, and if it doesn't find it, issues an AddParticipantCommand and afterwards an AddParticipantEntry command containing the Participant ID?
Should this be done entirely by the aggregateroots in the domain model itself?
You don't necessarily need sagas in order to deal with this situation. Take a look at my blog post on why not to create aggregate roots, and what to do instead:
http://udidahan.com/2009/06/29/dont-create-aggregate-roots/
Where should the logic be implemented that checks whether the Participant already exists and if it doesn't exist, creates the Participant?
In most instances, this behavior should be under the control of the Participant aggregate itself.
Processes are useful when you need to coordinate changes across multiple transaction boundaries. Two changes to the same aggregate, however, can be managed within the same transaction.
You can implement this as two distinct transactions operating on the same aggregate, with coordination; but the extra complexity of a process doesn't offer any gains. It's much simpler to send the single command to the aggregate, and allow it to decide what actions to take to maintain the correct invariant.
Sagas, in particular, are a pattern for reverting multiple transactions. Yan Cui's How the Saga Pattern manages failures with AWS Lambda and Step Functions includes a good illustration of a travel booking saga.
(Note: there is considerable confusion about the definition of "saga"; the NServiceBus community tends to understand the term a slightly different way than originally described by Garia-Molina and Salem. kellabyte's Clarifying the Saga Pattern surveys the confusion.)

Rules to guide when to stick with CRUD ORM or switch to DDD event store

I have seen ORM use a unit of work to commit multiple repositories in a single step.
I have also seen DDD and the use of aggregate roots saved via repositories, when using event stores persistence conceptually becomes quite clear to understand.
I always need to write data access code and whilst I am familiar with ORM, I am new to domain driven design and event sourcing - event sourcing is great, but does come with a lot of infrastructure.
Ultimately I would like to some rules to help decide at what point (code size, number of database entities) when DDD+ES becomes worth the extra effort over CRUD systems.
To help decide my questions are as follows:
I haven't seen aggregate roots combined in to a single unit of work, is this avoided? If so what problems can this cause?
In DDD a customer entity may have addresses and phones embedded within it (value objects), whereas in ORM there is a unit of work with customer, phone and address repositories. What is the best way to explain and understand these different approaches?
Can ORM use multiple different unit of works (each referencing relevant and related repositories/tables) to represent an aggregate root?
What are the pain/warning signs to look out for with impedance mismatch from my domain to ORM, at which point we may consider switching to an event store?
An aggregate defines a consistency boundary. In NoSQL databases, it is usually not possible to commit multiple entities per transaction. Therefore, in DDD with NoSQL, it is desirable to only have a single aggregate in a unit of work while updates to entities external to the aggregate at hand are delivered in an eventually consistent manner.
If addresses and phones are value objects then they shouldn't have repositories. In the ORM, they would be mapped as components of a parent entity not a separate mapping.
I'm not sure what you'd achieve this way?
One pain point that naturally leads to event sourcing is the need to preserve all state changes in an aggregate. Furthermore, event sourcing and the concept of domain events in general provide a different domain modelling methodology focused on behavior rather than state. I'd consider ES when there is potential business value in preserving all state changes. If you are willing to make the initial infrastructure investment, ES can in many ways be simpler by avoiding ORM madness. Think of CRUD as event sourcing with only 4 event types, or even 2 (read, update). Beyond the most basic domains, it is desirable to have more context beyond changes to data which leads you to ES.

Eventual consistency across aggregate roots in the same bounded context using a process manager aka saga

Suppose you've got two aggregates in your bounded context which have some constraints amongst each other. Using DDD these inter aggregate constraints can't be enforced in the same transaction i.e. the aggregate boundaries are transactional boundaries.
Would you consider using what in the Microsoft CQRS journey is called a "process manager" to coordinate two aggregates in the same bounded context or is a process manager only used to coordinate between two bounded contexts? What would the equivalent of a process manager that coordinates two or more aggregate roots within the same bounded context be?
An aggregate root defines a bounded context by default, albeit a lower level one (btw the lowest level bounded context you can find is an object, any object). The process manager is the name they used instead of a saga, proabably you can come up with other names too, it doesn't matter, they all have the same purpose.
And yes, I would consider using a saga to achieve eventual consistency. In fact, I think this is the best way and this is exactly what I'm doing in my own apps. Anyway, I'm using a message driven architecture (yes, in a local, non-distributed application) and I have automatically saga support via the service bus (my own, not released yet).
What is important when dealing with eventual consistency is to ensure idempotency everywhere. That is the aggregate roots should reject a duplicate operation and of course the event handler should be able to cope with the fact that the same event can be published more than once. However, be aware that you can't guarantee 100% idempotency but you can get very close to.

Resources