DDD Relate Aggregates in a long process running - domain-driven-design

I am working on a project in which we define two aggregates: "Project" and "Task". The Project, in addition to other attributes, has the points attribute. These points are distributed to the tasks as they are defined by users. In a use case, the user assigns points for some task, but the project must have these points available.
We currently model this as follows:
“task.RequestPoints(points)“, this method will create an aggregate PointsAssignment with attributes points and taskId, which in its constructor issues a PointsAssignmentRequested domain event.
The handler of the event issued will fetch the project related to the task and the aggregate PointsAssigment and call the method “project.assignPoints(pointsAssigment, service)”, that is, it will pass PointAssignment aggregate as a parameter and a service to calculate the difference between the current points of the task and the desired points.
If points are available, the project will modify its points attribute and issue a “ProjectPointsAssigned” domain event that will contain the pointsAssignmentId attribute (in addition to others)
The handler of this last event will fetch the PointsAssingment and confirm “pointsAssigment.Confirm ()”, this aggregate will issue a PointsAssigmentConfirmed domain event
The handler for this last event will bring up the associated task and call “task.AssignPoints (pointsAssignment.points)”
My question is: is it correct to pass in step 2 the aggregate PointsAssignment in the project method? That was the only way I found to be able to relate the aggregates.
Note: We have created the PointsAssignment aggregate so that in case of failure I could save the error “pointsAssignment.Reject(reasonText)” and display it to the user, since I am using eventual consistency (1 aggregate per transaction).
We think about use a Process Manager (PointsAssingmentProcess), but the same way we need the third aggregate PointsAssingment to correlate this process.

I would do it a little bit differently (it doesn´t mean more correct).
Your project doesn´t need to know anything about the PointsAssignment.
If your project is the one that has the available points for use, it can have simple methods of removing or adding points.
RemovePointsCommand -> project->removePoints(points)
AddPointsCommand -> project->addPoints(points)
Then, you would have an eventHandler that would react to the PointsAssignmentRequested (i imagine this guy has the id of the project and the number of points and maybe a status field from what you said)
This eventHandler would only do:
on(PointsAssignmentRequested) -> dispatch command (RemovePointsCommand)
// Note that, in here it would be wise to the client to send an ID for this operation, so it can do it asynchronously.
That command can either success or fail, and both of them can dispatch events:
// Remember that you have a correlation id from earlier persisted
Then, you would have a final eventHandler that would do:
on(RemovePointsSucceeded) -> PointsAssignment.succeed() //
Dispatches PointsAssignmentSuceeded
on(PointsAssignmentSuceeded) -> task.AssignPoints
On the fail side
on(RemovePointsFailed) -> PointsAssignment.fail() // Dispatches PointsAssignmentFailed
This way you dont have to mix aggregates together, all they know are each others id´s and they can work without knowing anything about the schema of other aggregates, avoiding undesired coupling.
I see the semantics of the this problem exactly as a bank transfer.
You have the bank account (project)
You have money in this bank account(points)
You are transferring money through a transfer process (pointsAssignment)
You are transferring money to an account (task)
The bank account only should have minimal operations, of withdrawing and depositing, it does not need to know anything about the transfer process.
The transfer process need to know from which bank it is withdrawing from and to which account it is depositing to.
I imagine your PointsAssignment being like
"points" : 10,
"status" : ["issued", "succeeded", "failed"]


CQRS Aggregate and Projection consistency

Aggregate can use View this fact is described in Vaughn Vernon's book:
Such Read Model Projections are frequently used to expose information to various clients (such as desktop and Web user interfaces), but they are also quite useful for sharing information between Bounded Contexts and their Aggregates. Consider the scenario where an Invoice Aggregate needs some Customer information (for example, name, billing address, and tax ID) in order to calculate and prepare a proper Invoice. We can capture this information in an easy-to-consume form via CustomerBillingProjection, which will create and maintain an exclusive instance of CustomerBilling-View. This Read Model is available to the Invoice Aggregate through the Domain Service named IProvideCustomerBillingInformation. Under the covers this Domain Service just queries the document store for the appropriate instance of the CustomerBillingView
Let's imagine our application should allow to create many users, but with unique names. Commands/Events flow:
CreateUser{Alice} command sent
UserAggregate checks UsersListView, since there are no users with name Alice, aggregate decides to create user and publish event.
UserCreated{Alice} event published // By UserAggregate
UsersListProjection processed UserCreated{Alice} // for simplicity let's think UsersListProjection just accumulates users names if receives UserCreated event.
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UserCreated{Bob} event published // By UserAggregate
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UsersListProjection processed UserCreated{Bob} .
UsersListProjection processed UserCreated{Bob} .
The problem is - UsersListProjection did not have time to process event and contains irrelevant data, aggregate used this irrelevant data. As result - 2 users with the same name created.
how to avoid such situations?
how to make aggregates and projections consistent?
how to make aggregates and projections consistent?
In the common case, we don't. Projections are consistent with the aggregate at some time in the past, but do not necessarily have all of the latest updates. That's part of the point: we give up "immediate consistency" in exchange for other (higher leverage) benefits.
The duplication that you refer to is usually solved a different way: by using conditional writes to the book of record.
In your example, we would normally design the system so that the second attempt to write Bob to our data store would fail because conflict. Also, we prevent duplicates from propagating by ensuring that the write to the data store happens-before any events are made visible.
What this gives us, in effect, is a "first writer wins" write strategy. The writer that loses the data race has to retry/fail/etc.
(As a rule, this depends on the idea that both attempts to create Bob write that information to the same place, using the same locks.)
A common design to reduce the probability of conflict is to NOT use the "read model" of the aggregate itself, but to instead use its own data in the data store. That doesn't necessarily eliminate all data races, but you reduce the width of the window.
Finally, we fall back on Memories, Guesses and Apologies.
It's important to remember in CQRS that every write model is also a read model for the reads that are required to validate a command. Those reads are:
checking for the existence of an aggregate with a particular ID
loading the latest version of an entire aggregate
In general a CQRS/ES implementation will provide that read model for you. The particulars of how that's implemented will depend on the implementation.
Those are the only reads a command-handler ever needs to perform, and if a query can be answered with no more than those reads, the query can be expressed as a command (e.g. GetUserByName{Alice}) which when handled does not emit events. The benefit of such read-only commands is that they can be strongly consistent because they are limited to a single aggregate. Not all queries, of course, can be expressed this way, and if the query can tolerate eventual consistency, it may not be worth paying the coordination tax for strong consistency that you typically pay by making it a read-only command. (Command handling limited to a single aggregate is generally strongly consistent, but there are cases, e.g. when the events form a CRDT and an aggregate can live in multiple datacenters where even that consistency is loosened).
So with that in mind:
CreateUser{Alice} received
user Alice does not exist
persist UserCreated{Alice}
CreateUser{Alice} acknowledged (e.g. HTTP 200, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Alice}
CreateUser{Bob} received
user Bob does not exist
persist UserCreated{Bob}
CreateUser{Bob} acknowledged
CreateUser{Bob} received
user Bob already exists
command-handler for an existing user rejects the command and persists no events (it may log that an attempt to create a duplicate user was made)
CreateUser{Bob} ack'd with failure (e.g. HTTP 401, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Bob}
Note that while the UserListProjection can answer the question "does this user exist?", the fact that the write-side can also (and more consistently) answer that question does not in and of itself make that projection superfluous. UserListProjection can also answer questions like "who are all of the users?" or "which users have two consecutive vowels in their name?" which the write-side cannot answer.

What is the best way to rehydrate aggregate roots and their associated entities in an event sourced environment

I have seen information on rehydrating aggregate roots in SO, but I am posting this question because I did not find any information in SO about doing so with in the context of an event sourced framework.
Has a best practice been discovered or developed for how to rehydrate aggregate roots when operating on the command side of an application using the event sourcing and CQRS pattern
OR is this still more of a “preference“ among architects?
I have read through a number of blogs and watched a number of conference presentations on you tube and I seem to get different guidance depending on who I am attending to.
On the one hand, I have found information stating fairly clearly that developers should create aggregates to hydrate themselves using “apply“ methods on events obtained directly from the event store..
On the other hand, I have also seen in several places where presenters and bloggers have recommended rehydrating aggregate roots by submitting a query to the read side of the application. Some have suggested creating specific validation “buckets“ / projections on the read side to facilitate this.
Can anyone help point me in the right direction on discovering if there is a single best practice or if the answer primarily depends upon performance issues or some other issue I am not thinking about?
Hydrating Aggregates in an event sourced framework is a well-understood problem.
On the one hand, I have found information stating fairly clearly that
developers should create aggregates to hydrate themselves using
“apply“ methods on events obtained directly from the event store..
This is the prescribed way of handling it. There are various ways of achieving this, but I would suggest keeping any persistence logic (reading or writing events) outside of your Aggregate. One simple way is to expose a constructor that accepts domain events and then applies those events.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application. Some have
suggested creating specific validation “buckets“ / projections on the
read side to facilitate this.
You can use the concept of snapshots as a way of optimizing your reads. This will create a memoized version of your hydrated Aggregate. You can load this snapshot and then only apply events that were generated since the snapshot was created. In this case, your Aggregate can define a constructor that takes two parameters: an existing state (snapshot) and any remaining domain events that can then be applied to that snapshot.
Snapshots are just an optimization and should be considered as such. You can create a system that does not use snapshots and apply them once read performance becomes a bottleneck.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application
Snapshots are not really part of the read side of the application. Data on the read side exists to satisfy use cases within the application. Those can change based on requirements even if the underlying domain does not change. As such, you shouldn't use read side data in your domain at all.
Event sourcing has developed different styles over the years. I could divide all o those into two big categories:
an event stream represents one entity (an aggregate in case of DDD)
one (partitioned) event stream for a (sub)system
When you deal with one stream per (sub)system, you aren't able to rehydrate the write-side on the fly, it is physically impossible due to the number of events in that stream. Therefore, you would rely on the projected read-side to retrieve the current entity state. As a consequence, this read-side must be fully consistent.
When going with the DDD-flavoured event sourcing, there's a strong consensus in the community how it should be done. The state of the aggregate (not just the root, but the whole aggregate) is restored by the command side before calling the domain model. You always restore using events. When snapshotting is enabled, snapshots are also stored as events in the aggregate snapshot stream, so you read the last one and all events from the snapshot version.
Concerning the Apply thing. You need to clearly separate the function that adds new events to the changes list (what you're going to save) and functions what mutate the aggregate state when events are applied.
The first function is the one called Apply and the second one is often called When. So you call the Apply function in your aggregate code to build up the changelist. The When function is called when restoring the aggregate state from events when you read the stream, and also from the Apply function.
You can find a simplistic example of an event-sourced aggregate in my book repo: https://github.com/alexeyzimarev/ddd-book/blob/master/chapter13/src/Marketplace.Ads.Domain/ClassifiedAds/ClassifiedAd.cs
For example:
public void Publish(UserId userId)
=> Apply(
new V1.ClassifiedAdPublished
Id = Id,
ApprovedBy = userId,
OwnerId = OwnerId,
PublishedAt = DateTimeOffset.Now
And for the When:
protected override void When(object #event)
switch (#event)
// more code here
case V1.ClassifiedAdPublished e:
ApprovedBy = UserId.FromGuid(e.ApprovedBy);
State = ClassifiedAdState.Active;
// and more here

Event Sourcing: proper way of rolling back aggregate state

I'm looking for an advice related to the proper way of implementing a rollback feature in a CQRS/event-sourcing application.
This application allows to a group of editors to edit and update some editorial content, an editorial news for instance. We implemented the user interface so that each field has an auto save feature and now we would like to provide our users the possibility to undo the operations they did, so that it is possible to rollback the editorial news to a previous known state.
Basically we would like to implement something like to the undo command that you have in Microsoft Word and similar text editors. In the backend, the editorial news is an instance of an aggregate defined in our domain and called Story.
We have discussed some ideas to implement the rollback and we are looking for an advice based on real world experiences in similar projects. Here is our considerations about this feature.
How rollback works in real world business domains
First of all, we all know that in real world business domains what we are calling rollback is obtained via some form of compensation event.
Imagine a domain related to some sort of service for which it is possible to buy a subscription: we could have an aggregate representing a user subscription and an event describing that a charge has been associated to an instance of the aggregate (the particular subscription of one of the customers). A possible implementation of the event is as follows:
public class ChargeAssociatedToSubscriptionEvent: DomainEvent
public Guid SubscriptionId {get; set;}
public decimal Amount {get; set;}
public string Description {get; set;}
public DateTime DueDate {get; set;}
If a charge is wrongly associated to a subscription, it is possible to fix the error by means of an accreditation associated to the same subscription and having the same amount, so that the effect of the charge is completely balanced and the user get back its money. In other words, we could define the following compensation event:
public class AccreditationAssociatedToSubscription: DomainEvent
public Guid SubscriptionId {get; set;}
public decimal Amount {get; set;}
public string Description {get; set;}
public DateTime AccreditationDate {get; set;}
So if a user is wrongly charged for an amount of 50 dollars, we can compensate the error by means of an accreditation of 50 dollars to the user subscription: this way the state of the aggregate has been rolled back to the previous state.
Why things are not as easy as they seem
Based on the previous discussion, the rollback seems quite easy to be implemented. If you have an instance of the story aggregate at the aggregate revision B and you want to roll it back to a previous aggregate revision, say A (with A < B), you just have to do the following steps:
check the event store and get all the events between revisions A and B
compute the compensation event for each of the occurred events
apply the compensation events to the aggregate in the reverse order
Unfortunately, the second step of the previous procedure is not always possible: given a generic domain event it is not always possible to compute its compensation event, because the amount of information contained inside the event could not be enough to do that. Maybe it is possible to wisely define all the events so that they contain enough information to be able to compute the corresponding compensation event, but at the current state of our application there are several events for which computing the compensation event is not possible and we would prefer to avoid changing the shape of our events.
A possible solution based on state comparison
The first idea to overcome the issues with compensation event is computing the minimum set of events needed to roll back the aggregate by comparing the current state of the aggregate with the target state. The algorithm is basically the following:
get an instance of the aggregate at the current state (call it B)
get an instance of the aggregate at the target state (call it A) by applying only the first n events persisted inside event store (our repository allows to do that by specifying the aggregate id and the desired point in time to which materialize the aggregate)
compare the two instances and compute the minimum set of events to be applied to the aggregate in the state B in order to change its state to A
apply the computed events to the aggregate
A smarter approach based on event replay
Another way to solve the problem of rolling back to a previous state of the aggregate could be doing the same thing that the aggregate repository does when an aggregate is materialized at a specific point in time. In order to do that we should define an event, say StoryResettedEvent, whose effect is to reset the state of the aggregate by completely emptying it and do the following steps:
apply the StoryResettedEvent to our aggregate so that its state is emptied
get the first n events for the aggregate we are working on (all the events from the first saved event up to the target state A)
apply all the events to the aggregate instance
The main problem I see with this approach is the event to empty the state of the aggregate: it seems somewhat artificial, not a real domain event with a business meaning, but rather a trick to implement the rollback functionality.
The third way: persisting the compensation event each time an event is saved inside the event store
The third way we figured out to get what we need is based again on the concept of compensation event. The basic idea is that each event of the application could be enriched with a property containing the corresponding compensation event.
In the point of the code where an event is raised it is possible to immediately compute the compensation event for the event to be raised (based on the current state of the aggregate and the shape of the event), so that the event could be enriched with this information that this way will be saved inside the event store. By doing so the compensation events events are always available, ready to be used in case of a rollback request. The downside of this solution is that each domain event must be modified and only a minimum part of the compensation events we must compute and save inside the event store will be useful for an actual rollback (most of them will never be used).
In my opinion the best option to solve the problem is using the algorithm based on state comparison (the first proposed solution), but we are still evaluating what to do.
Does anyone have already had a similar requirement ? Is there any other way to implement a rollback ? Are we completely missing the point and following bad approaches to the problem ?
Thanks for helping, any advice will be appreciated.
How the compensation events are generated should be the concern of the Story aggregate (after all, that's the point of an aggregate in event sourcing - it's just the validator of commands and generator of events for a particular stream).
Presumably you are following something like a typical CQRS/ES flow:
client sends an Undo command, which presumably says what version it wants to undo back to, and what story it is targetting
The Undo Command Handler loads the Story aggregate in the usual way, either possibly from a snapshot and/or by applying the aggregate's events to the aggregate.
In some way, the command is passed to the aggregate (possibly a method call with args extracted from the command, or just passing the command directly to the aggregate)
The aggregate "returns" in some way the events to persist, assuming the undo command is valid. These are the compensating events.
compute the compensation event for each of the occurred events
Unfortunately, the second step of the previous procedure is not always possible
Why not? The aggregate has been passed all previous events, so what does it need that it doesn't have? The aggregate doesn't just see the events you want to roll back, it necessarily processes all events for that aggregate ever.
You have two options really - reduce the book-keeping that the aggregate needs to do by having the command handler help out in some way, or the whole process is managed internally by the aggregate.
Command handler helps out:
The command handler extracts from the command the version the user wants to roll back to, and then recreates the aggregate as-of that version (applying events in the usual way), in addition to creating the current aggregate. Then the old aggregate gets passed to the aggregate's undo method along with the command, so that the aggregate can then do state comparison more easily.
You might consider this to be a bit hacky, but it seems moderately harmless, and could significantly simplify the aggregate code.
Aggregate is on its own:
As events are applied to the aggregate, it adds to its state whatever book-keeping it needs to be able to compute the compensating events if it receives an undo command. This could be a map of compensating events, pre-computed, a list of every previous state that can potentially be reverted to (to allow state comparison), the list of events the aggregate has processed (so it can compute the previous state itself in the undo method), or whatever it needs, and it just stores it in its in-memory state (and snapshot state, if applicable).
The main concern with the aggregate doing it on its own is performance - if the size of the book-keeping state is large, the simplification of allowing the command handler to pass the previous state would be worthwhile. In any case, you should be able to switch between the approaches at any time in the future without any issues (except possibly needing to rebuild your snapshots, if you have them).
My 2 cents.
For rollback operation, an orchestration class will be responsible to handle it. It will publish a aggregate_modify_generated event and a projection on the other end for this event will fetch the current state of the aggregates after receiving it. Now when any of the aggregate failed, it should generate a failure event, upon receiving it, orchestration class will generate a aggregate_modify_rollback event that will received by that projection and will set aggregate state with the previously fetched state .
One common projector can do the task, because the events will have aggregate id.

CQRS read model projection - business logic

So, I trigger command on aggregate root and it has some 10 events happened as a result of the command. This events are internal ones, and since outer systems need aggregation of this events, I decided to make projection (read projection basically). In order to make this projection from 10 events (internal) TO 1 event (external), I have to apply some business rules (business rules concerning merging of events). Where should I put this rules, since it seems like part of domain but I'm creating projections of internal events?
Basically since projection logic is part of domain, should I keep it inside aggregate and call it in code where projection is made?
So, inside one aggregate root, I have e.g. 3 events (internal) as response to one Command (aggregate.createPaintandwashatsametime(id, red)) that is sent to aggregate root and that are spreading through all the aggregate root entities like: CarCreated(Id), CarSeatColored(Red), CarWashed() etc. (all this 3 events are happened because of single command). External system expects to receive one external event as CarMaintainenceDone(Id, repainted=true, washed=true, somevalue=22);
Now, if i have some complex logic to make this CarMaintainenceDone event (like if(color==red then in projection somevalue==22 otherwise 44) - should this go in projection code or be part of domain?
Let me try to give you new example. Just ignore how domain is modeled since this is just example:
As you can see we have AggregateRoot that contains Multiplier which is there just to call things with the right name. When we do multiplication we first send integer 1 to ObjectA which has some logic to set internal state and emit ObjectAHasSetParam event. The same thing goes with ObjectB. Finally, ObjectC listens to all of this events, and on paramsHasBeenSet will do actual multiplication.
In event store in this case I would preserve list of events:
[ObjectAHasSetParam , ObjectBHasSetParam , ObjectCHasMultiplied ]
My point here was: if I emit all of this events one by one out of process - the state that somebody else updates will possibly be inconsistent, since this 3 events make sense only together. That is why I wanted to make something like projection, but I think in this case I just need to publish list of this events together instead of event by event.
class AggregateRoot{
Multiplier ml;
void handle(MultiplyCommand(1,2)){
class Multiplier{
ObjectA a;
ObjectB b;
ObjectC res;
void multiply(1,2){
class ObjectA{
int p;
void setParam(1){
p = 1 + 11;
class ObjectB{
int p;
void setParam(2){
p = 2 + 22;
class ObjectC{
int p1; int p2;
int res;
listen(ObjectAHasSetParam e1){
p1 = e1.par;
listen(ObjectBHasSetParam e2){
p2 = e2.par;
listen(paramsHaveBeenSet e3){
res = p1 * p2;
External system expects to receive one external event as CarMaintainenceDone(Id, repainted=true, washed=true, somevalue=22);
A ha! The short answer is process manager.
The longer answer is that you (should) have two aggregates right now. One of them is tracking the state of the car. The other is tracking the process of maintaining the car.
The big hint that there is another aggregate hidden somewhere: you've got this CarMaintenanceDone event, with no aggregate responsible for generating it. All events have an "aggregate" somewhere that produces them. The aggregate might be the real world, or a proxy for the real world (HttpRequestReceived), or a digital thing in some other bounded context; but the event is telling you that something, somewhere, changed state.
That is to say, you have some aggregate that knows the rule of when the maintenance is done. It's an information resource, a log of work. When CarWashed is published (by the Car, or the washing machine, or whatever), an event handler subscribed to the CarWashed event sends a command to the Maintenance aggregate to inform it. The Maintenance aggregate updates its own state, runs its logic, and publishes a MaintenanceCompleted event when all of the individual steps have been accounted for.
Most things that are process like can be implemented as Aggregates; the weird bit is that the "commands" tend to look like event handlers. But they have their own history (based on what they have observed), which describes how the state machine changed in response to each event observed.
It might be more than two, depending on the complexity of the processes.
Rinat Abdullin wrote a good introduction to process managers, that I reference frequently.
Isn't there a clear distinction between an aggregate and a process manager though? I thought process managers would only coordinate and live in the application service world, sending appropriate commands to aggregates based on the events received.
From what I've seen -- no, there isn't. The literature doesn't make that very clear.
For example, Udi Dahan wrote
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.
Saga, here, being equivalent to a process.
There's often 2 event models, internal events (only visible within a BC) and external events (published to the outside world). You could decide to make everything external but then you have to version everything.
You can read more about internal vs external events in the Patterns, Principles, and Practices of Domain-Driven Design book p.408 (scroll up a bit in the link).
Projections shouldn't be responsible to publish external events. One common practice would be to register an internal event handler from the application service layer which is responsible for publishing external events on a messaging infrastructure. You could leverage that process to aggregate these events together and publish a single external event from them.
How the aggregation is performed would be up to you, but since internal events can be raised synchronously and handlers are usually single-threaded you can just setup a state machine in the handler that kicks-in when it receives the first event of the batch and aggregates them until it receives the last, then publish on the message bus.
If your messaging infrastructure cannot participate in the same transaction as your event store you could just have an additional process that reads the committed events in order and does the same thing as above.
An alternative would be to let the consumer deal with the aggregation. That could be the right approach if the consumer should be able to veto what "CarMaintenanceDone" means.
Finally, you could also publish an extra event from the aggregate itself. The event may not be leveraged by the AR itself, but sometimes it's better to just do what's more practical (just like enriching events with data only consumed by the read model). This approach would also have the advantage of not having to change the logic if more events are added.
There should not be a notion of a external event. Events are generated by the Aggregates and consumed by synchronous read-models, sagas or published to the outside world where other systems and microservices use them whatever they want.
So, in your case, the consumer (implemented as a saga for example) should aggregate those events by its business rules and then do something (a saga can create a new command for example) and not the Aggregate.
UPDATE (in response to question being updated)
If you think that car maintenance is a responsibility of the Car Aggregate, then Car aggregate should raise the event. It depends on how the future behavior of the Car Aggregate is influenced by that CarMaintainenceDone event. In this particular context, I would generate the event from the Car aggregate, to make code simpler.

How are consistency violations handled in event sourcing?

First of all, let me state that I am new to Command Query Responsibility Segregation and Event Sourcing (Message-Drive Architecture), but I'm already seeing some significant design benefits. However, there are still a few issues on which I'm unclear.
Say I have a Customer class (an aggregate root) that contains a property called postalAddress (an instance of the Address class, which is a value object). I also have an Order class (another aggregate root) that contains (among OrderItem objects and other things) a property called deliveryAddress (also an instance of the Address class) and a string property called status.
The customer places an order by issueing a PlaceOrder command, which triggers the OrderReceived event. At this point in time, the status of the order is "RECEIVED". When the order is shipped, someone in the warehouse issues an ShipOrder command, which triggers the OrderShipped event. At this point in time, the status of the order is "SHIPPED".
One of the business rules is that if a Customer updates their postalAddress before an order is shipped (i.e., while the status is still "RECEIVED"), the deliveryAddress of the Order object should also be updated. If the status of the Order were already "SHIPPED", the deliveryAddress would not be updated.
Question 1. Is the best place to put this "conditionally cascading address update" in a Saga (a.k.a., Process Manager)? I assume so, given that it is translating an event ("The customer just updated their postal address...") to a command ("... so update the delivery address of order 123").
Question 2. If a Saga is the right tool for the job, how does it identify the orders that belong to the user, given that an aggregate can only be retrieved by it's unique ID (in my case a UUID)?
Continuing on, given that each aggregate represents a transactional boundary, if the system were to crash after the Customer's postalAddress was updated (the CustomerAddressUpdated event being persisted to the event store) but before the OrderDeliveryAddressUpdated could be updated (i.e., between the two transactions), then the system is left in an inconsistent state.
Question 3. How are such "violations" of consistency rules detected and rectified?
In most instances the delivery address of an order should be independent of any other data change as a customer may want he order sent to an arbitrary address. That being said, I'll give my 2c on how you could approach this:
Is the best place to handle this in a process manager?
Yes. You should have an OrderProcess.
How would one get hold of the correct OrderProcess instance given that it can only be retrieve by aggregate id?
There is nothing preventing one from adding any additional lookup mechanism that associates data to an aggregate id. In my experimental, going-live-soon, mechanism called shuttle-recall I have a IKeyStore mechanism that associates any arbitrary key to an AR Id. So you would be able to associate something like [order-process]:customerId=CID-123; as a key to some aggregate.
How are such "violations" of consistency rules detected and rectified?
In most cases they could be handled out-of-band, if possible. Should I order something from Amazon and I attempt to change my address after the order has shipped the order is still going to the original address. If your case of linking the customer postal address to the active order address you could notify the customer that n number of orders have had their addresses updated but that a recent order (within some tolerance) has not.
As for the system going down before processing you should have some guaranteed delivery mechanism to handle this. I do not regard these domain event in the same way I regard system events in a messaging infrastructure such as a service bus.
Just some thoughts :)
