One aggregate per transaction, with "one" or "multiple" bounded contexts - domain-driven-design

Following the Vaughn Vernon recommendation, to achieve a high level of decoupling and single responsibility, just one aggregate should be changed per transaction.
In the chapter 8 of the Red Book Vaughn Vernon demonstrated how two aggregates can "talk" to each other with domain events. In the chapter 13 how different aggregates in two different bounded context can "talk" to each other with notifications.
My question is, why should I deal with these situations differently once both of them happen in different transaction? If is it just one or multiple bounded contexts the possible problems wouldn't be the same?
For example, if the application crashes between two domain events in the same bounded context I'll end up with inconsistency as with two bounded contexts.
It seems that the safest way to deal with two aggregates "talking" to each other asynchronously is to have a transitional status in it, persist the events before send them (to avoid lose events), have idempotent operations when possible and deduplicate the event in the receiving side when it's not possible to execute the operation in an idempotent way.

I see two aspects to consider in your question:
The DDD aspect: Event types and what you do with them
A technical aspect: how to implement it reliably
Regarding the types of Events what I would say is that events that stay within the boundaries of a bounded context (often called Domain Events) normally carry a lot of information. Potentially a big part of the state of the Aggregate. If you use CQRS, they are used to create the Read Model. Events that cross the BC boundaries are sometimes called Integration Events and they should carry as little data as possible (potentially, only global IDs, like CustomerId, OrderId). The reason is that every extra property that you add is extra coupling between the publisher BC and the subscriber BCs, which is what you want to minimize.
I would say that it's this distinction between the types of Events which might lead to have different technical solutions, but I agree with you that it doesn't have to be this way if you find a solution that works well for both cases.
The solution you propose is correct. It looks very similar to the Outbox feature of NServiceBus, which basically takes care of all this for you.
Another approach that I've used, if your message broker supports it, is what Azure Service Bus calls Send Via. With this feature, you can publish events Via your own queue but the send will be committed transactionally with the removal of the incoming message from the queue. This means that if for some reason the message that you are processing is not deleted from the queue successfully (DB update exception, broker unavailable, etc) and therefore it will be retried, you know for sure that the events won't be sent and you can safely publish them again during the retry. This makes making idempotent operations simpler and avoids publishing ghost messages.

Related

DDD- Mapping events from external Bounded Context to domain model

My team is building a new microservice leveraging techniques from Domain-Driven Design and Event Sourcing. This service has to integrate with a handful of external bounded contexts (BCs) in the form of other legacy services. We've identified our core domain model, and its clear that we need some sort of Anti-corruption layer (ACL) between the external BCs and our internal domain model. However, we're getting hung up on some of the technical details of how best to accomplish this.
Let's say our domain has an Asset aggregate which can represent some piece of equipment from several of these external BCs. Once of these BCs, let's say BC-A, sends out EquipmentUpdate events on a message broker that our application can subscribe to. These events carry no intent - they merely reflect that the state of the external entity has changed in some way and it's up to us to determine what actually changed. The ID in the events is also from the external BC, not our domain.
So our ACL has to do the following tasks:
Map the external identifier from BC-A to our internal aggregate
Perform a diff of the new event and current/previous state to figure out what actually changed
Do this in a way that is resilient against out-of-order and duplicate event messages
Option 1 - Query directly from repository
First option is to use the external identifier directly in our repository to fetch the matching Asset. This seems like the simplest option, however it feels wrong since it's leaking concepts from external BCs into our repository API.
Furthermore, it forces us to to store external identifiers and event version directly on our aggregate which also feels like it defeats the purpose of the ACL.
interface AssetRepository {
Optional<Asset> findAssetByExternalIdentifier(String externalId);
}
Option 2 - Dedicated mapping
Second option is to expose a dedicated query that uses the external identifier to query the matching internal identifier, as well as the latest version that was processed.
This feels cleaner, but requires an additional read from the database where option 1 was a single read.
data class AssetMappingQueryResult {
String AssetId;
Long LatestVersion;
}
interface AssetMappingQuery {
Optional<AssetMappingQueryResult> resolveFromExternalIdentifier(String externalId);
}
How are other teams doing this?
I would tend to treat the EquipmentUpdate message as a signal that something might have changed. On receipt of such a message, the ACL queries BC-A for the latest state for the associated IDs, compares that state with the state it received the last time, and emits commands corresponding to the state changes which are of interest to the bounded context you're developing. In the case of duplicate messages (where the second event conveys no state change), this approach is idempotent. The confluence of "select current" likewise makes out-of-order not a concern. The ACL may want to guard against concurrent modifications involving the same ID, though viewing its output as commands against your BC's write model might make that unnecessary (especially depending on the chance that one of the concurrent modifications might be slow). The specific techniques for that vary, my personal preference coming from the Akka world would be to have responsibility for a given ID assigned to an actor.
An ACL is by its nature somewhat outside of any bounded context: it's analogous to the space between customs/immigration outposts on the border between countries. One could say it's partially in both (it's also, from a CQRS standpoint, a read-model for the "other" bounded context, though it may be a "read-model once removed", given that it should probably query a read-model in the source bounded context) bounded contexts. Alternatively, one could call an ACL a miniature bounded context which incorporates knowledge of parts of two other BCs: this may even extend to having its own aggregates, repositories, etc.

Event sourcing microservices: How to manage timestamp

We have microservices, each generating events that are being stored by a event-sourcing repository. We use Cassandra to store the event data.
As you may know, the order of the events is important.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
As you may know, the order of the events is important.
In some cases - but you'll want to be careful not to confuse time, order, and correlation.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
Give up the idea that there is an "order" to events that are happening in different places. There is no now.
Udi Dahan on race conditions in the business world:
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If your micro service boundaries are correct, then events happening in two difference services at about the same time are coincident -- there isn't one correct ordering of them, because (to stretch an analogy) they are in different light cones. The only ordering that is inherently real is that within a single aggregate event history.
What can make real sense is tracking causation; these changes in this book of record are a reaction to those changes in that book of record.
One simple form of this is to track happens-before, which is where ideas like vector clocks begin to appear.
In most discussions that I have seen, this information would be passed along as meta data of the recorded events.
This is typically done via vector clocks:
A vector clock is an algorithm for generating a partial ordering of events in a distributed system and detecting causality violations.
If I understand your problem correctly, you're trying to guard writes, i.e. to make sure that a microservice instance is up to date with all the relevant events before making another write.
In that case, have a look at lightweight transactions, which can be used to implement optimistic locking in Cassandra.
This talk by Christopher Batey is a very good start.

Do we really need a separate event store with Event Sourcing and CQRS patterns?

Suppose we have a situation when we need to implement some domain rules that requires examination of object history (event store). For example we have an Order object with CurrentStatus property, and we need to examine Order.CurrentStatus changes history.
Most likely you will answer that I need to move this knowledge to domain and introduce Order.StatusHistory property that contains a collection of status records, and that I should not query event store. And I will agree with you.
What I question is the need of Event Store.
We write in event store events that has business meaning (domain value), we do not record UserMovedMouse events (in most cases). And as with OrderStatusChanged event there is a high chance that most of events from EventStore will be needed at some point for domain logic, and we end up with a domain object that have a EventHistory property with the collection of events.
I can see a value in separate event store for patterns such as CQRS when you have a single write only event store and multiple read only query stores, which gives you some scalability. However the need to to introduce such thing in code is in question too for me. All decent databases support single write server, multiple read servers scalability (master-slave replication). Why should I introduce such thing at source code level? Why not to forget about Web Services, and Message buses and use write your own wrapers around Sockets.
I have a great respect to "old school" DDD as it was described be Eric Evans, and I see some fresh and good ideas in new wave DDD+SQRC+EventSourcing pattern aggregate. However the main idea of CQRS is under big question for me. Am I missing something?
In short: if event sourcing is not needed (for its added benefits or as workarounds for some quirks), then you definitely shouldn't bring it into your system just for the sake of it.
ES is just one of many ways to augment CQRS architectural style within a bounded context. It is not a requirement.

Inter-Aggregate Communication in CQRS + DDD + Event Sourcing

How should separate aggregate roots (AR) communicate with one another in an environment built on DDD principles using an event-sourced aggregate back-end?
For instance, I have a Facility aggregate root (AR) which has a factory method responsible for creating a Booking AR. The Booking is a time-sensitive combination of a Person AR and a Facility AR. A Person can only be booked in a single Facility.
In DDD, I would have held references to the Booking in Person, and Person in Facility. However, when generating events for use in event-sourcing I think that trying to handle the event deserialization from the back-end would become prohibitive. Therefore, I've taken to only holding references to the value object-based unique id's. This brings up a new problem, however, when a method on an AR needs to call another method on another AR -- how do you handle that situation? Hit the event source repository from the domain AR?
What is the general use case in this scenario? Am I approaching this all wrong?
Aggregate Root boundaries define a consistency boundary.
Inside the aggregate, consistency is guaranteed.
Outside... it's not.
So you should not have operations that spans several aggregates and have to be consistent.
If you need a transaction that spans two aggregates, you should review your aggregate boundaries.
For things that happen outside the aggregate you should have an event handler that will send a command to other aggregates.
If the logic of actions between aggregates is more complicated, you can define a process, a state machine that will listen to events and send commands to aggregates.
Processes can be used to define long running transactions (with compensation instead of rollback), or take business decisions based on what's happening in the system at a large scale (even between bounded contexts).
When using Event Sourcing and CQRS the most elegant (at least in my opinion) way of inter-AR communication is messaging. You can look at Ncqrs project (it will be easier if you are a .NET guy), particularly 'Messaging' branch. The idea is, ARs implement IMessageHandler interface for every message type they handle and AR base class exposes method Send for sending there messages. By means of this API clients can invoke model behavior and model itself can communicate (between ARs).

When should one use the Actor model?

When should the Actor Model be used?
It certainly doesn't guarantee deadlock-free environment.
Actor A can wait for a message from B while B waits for A.
Also, if an actor has to make sure its message was processed before moving on to its next task, it will have to send a message and wait for a "your message was processed" message instead of the straightforward blocking.
What's the power of the model?
Given some concurrency problem, what would you look for to decide whether to use actors or not?
First I would look to define the problem... is the primary motivation a speedup of a nested for loop or recursion? If so a simple task based approach or parallel loop approach will likely work well for you (rather than actors).
However if you have a more complex system that involves dependencies and coordinating shared state, then an actor approach can help. Specifically through use of actors and message passing semantics you can often avoid using explicit locks to protect shared state by actually making copies of that state (messages) and reacting to them.
You can do this quite easily with the classic synchronization problems like dining philosophers and the sleeping barbers problem. But you can also use the 'actor' to help with more modern patterns, i.e. your facade could be an actor, your model view and controller could also be actors that communicate with each other.
Another thing that I've observed is that actor semantics are learnable by most developers and 'safer' than their locked counterparts. This is because they raise the abstraction level and allow you to focus on coordinating access to that data rather than protecting all accesses to the data with locks. As an example, imagine that you have a simple class with a data member. If you choose to place a lock in that class to protect access to that data member then any methods on that class will need to ensure that they are accessing that data member under the lock. This becomes particularly problematic when others (or you) modify the class at a later date, they have to remember to use that lock.
On the other hand if that class becomes an actor and the data member becomes a buffer or port you communicate with via messages, you don't have to remember to take the lock because the semantics are built into the buffer and you will very explicitly know whether you are going to block on that based on the type of the buffer.
-Rick
The usage of Actor is "natural" in at least two cases:
When you can decompose your problem in a set of independent tasks.
When you can decompose your problem in a set of tasks linked by a clear workflow (ie. dataflow programming).
For instance, if you process complex data using a series of filters, it is easy to use a pipeline of actors where each actor receives data from an upstream actor and sets data to a downstream actor.
Of course, this data-flow must not be linear and if a step is slow in your pipeline, instead you can use a pool of actors doing the same job. Another way of solving the load balancing problems would be to use instead a demand-driven approach organized with a kind of virtual Kanban system.
Of course, you will need synchronization between actors in almost all interesting cases, but contrary to the classic multi-thread approach, this synchronization is really "concrete". You can imagine guys in a factory, imagine possible problems (workers run out of the job to do, upstream operations is too fast and intermediate products need a huge storage place, etc.) By analogy, you can then find a solution more easily.
I am not an actor expert but here is my 2 cents when to use actor model:
Actor model is not suited for every concurrent application, for instance if you are creating an application which is multi threaded and works in high concurrency actor model is not made to solve the concurrency issue.
Where actors really comes into play is when you are creating an event driven application. For instance you have an application and you are tracking what are users clicking in your application realtime. You can use actors to do activities realtime segregated by user, device or anything of your business requirement as actors are stateful. So, for example if some users lies in actors which clicked on shirts you can send them notification of some coupon.
Also some applications where actors comes handy are : Finance (Pricing, fraud detection), multiplayer gaming.
Actors are asynchronous and concurrent but does not guarantee message order or time limit as to when the message may be acted upon. Hence atomic transactions cannot be split into Actors.
If the application/task involves no mutable state then Actors are overkill as Actor frameworks go to great lengths to avoid race conditions.

Resources