Event-sourcing and sagas - compensating transactions

Event-sourcing and sagas - compensating transactions - domain-driven-design

First question on SO (really???), so bear with me please :)
We're architecting a solution using event sourcing. Some of our business processes will be long-running, thus we're planning on using sagas to orchestrate commands to several aggregate roots.
In my understanding, if a saga-issued command should fail, the saga would be responsible to issue compensating commands to all the previously invoked aggregate roots.
What should be the course of action if the state of an aggregate root would be mutated externally (i.e. by some other process/user) after it takes part in the saga, but before the saga fails and issues a compensating command to that aggregate root?
In other words, how would one try to compensate for an event that is not the last one in a certain aggregate root's event stream (speaking in EventStore lingo)?

This is a rather tricky situation since what I am seeing is that you may end up with an invalid AR after your compensating entry, making your compensating action invalid.
You are probably going to have to re-look at the design of the process such that changes to ARs are not made until you are sure that your process manager (saga) will be able to complete. Perhaps temporarily storing the values for later change.
Another approach may be to prevent certain commands on your AR should it be in a certain state that indicates that it may lead to issues for those commands. The user would then not be able to issue those commands. Your process manager would take care of that state and any state expiries/timeouts and so forth.

Related

Problem with DDD and changing only one Aggegate in one transaction

I have a problem with my personal project.
I have there Project, which has Stages, and Stages have Tasks. At first, I was trying to make Project and AggregateRoot and Stage and Tasks Entities inside that Aggregate. As there are other Entities such as Costs, Installments, FinancialData and many, many more there as well Project has started to grow into god class, so I have reconsidered in all and made Project, Stage, and Task separate AggregateRoots.
So I have started refactoring it, and all was fine, but I have a problem with one functionality. Status system. Sometimes changes of the status on Task can start a chain of changing status of Stage and then Project as well (for example, adding new Task to finished Stage should put that stage into in progress status, and then if Project was in finished status, should be moved to in progress as well). Here is my question. How to approach that?
What I was doing till now, I was loading Project from the repository in one of the first actions in application service that was marked with #Transactional and saving at the end of that method after all actions.
After refactoring sometimes there is a need that I need to change three AggregateRoots in one transaction. If that should be then one Aggregate, then Project is coming back to the state, when it has tons of methods to handle all changes on Stages and Tasks. I'm a bit lost here.
Should load all three at the very beginning of the action, pass them in the chain of actions, and at the end of the method call save on each repository?

The operations which can touch multiple aggregates are often best modeled as sagas (there's an alternative if event-driven, but there's nothing in your question indicating that the rest of the system is event-driven). The saga would operate on the various aggregates and, importantly, be able to handle a failure/rejection of an operation (e.g. depending on requirements: retry (implies a potentially arbitrarily long period of visible inconsistency), undo changes to other aggregates, or tear down the system (sacrifice availability for consistency)).

You should understand do you really need transactional consistency between your statuses, maybe eventual consistency will be a solution and you will update statuses by events. In case it requires transactional consistency then it must be one aggregate because this is the main feature of aggregate to protect true invariants. To find an answer to the question you need to ask the business about this in a real project. The important thing is true invariants, in your example, I think, you have entity-oriented aggregates, but you need more policy or process-oriented aggregates with capability is only to protect true invariants rather than being data containers. Maybe this video will be helpful Mauro Servienti - Talk Session: All Our Aggregates Are Wrong

How to process Read Model in CQRS

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.

If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.

For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

DDD - How to modify several AR (from different bounded contexts) throughout single request?

I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.

How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".

Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊

How to avoid concurrency on aggregates status using Rebus in a server cluster

I have a web service that use Rebus as Service Bus.
Rebus is configured as explained in this post.
The web service is load balanced with a two servers cluster.
These services are for a production environment and each production machine sends commands to save the produced quantities and/or to update its state.
In the BL I've modelled an Aggregate Root for each machine and it executes the commands emitted by the real machine. To preserve the correct status, the Aggregate needs to receive the commands in the same sequence as they were emitted, and, since there is no concurrency for that machine, that is the same order they are saved on the bus.
E.G.: the machine XX sends a command of 'add new piece done' and then the command 'Set stop for maintenance'. Executing these commands in a sequence you should have Aggregate XX in state 'Stop', but, with multiple server/worker roles, you could have that both commands are executed at the same time on the same version of Aggregate. This means that, depending on who saves the aggregate first, I can have Aggregate XX with state 'Stop' or 'Producing pieces' ... that is not the same thing.
I've introduced a Service Bus to add scale out as the number of machine scales and resilience (if a server fails I have only slowdown in processing commands).
Actually I'm using the name of the aggregate like a "topic" or "destinationAddress" with the IAdvancedApi, so the name of the aggregate is saved into the recipient of the transport. Then I've created a custom Transport class that:
1. does not remove the messages in progress but sets them in state
InProgress.
2. to retrive the messages selects only those that are in a recipient that have no one InProgress.
I'm wandering: is this the best way to guarantee that the bus executes the commands for aggregate in the same sequence as they arrived?

The solution would be have some kind of locking of your aggregate root, which needs to happen at the data store level.
E.g. by using optimistic locking (probably implemented with some kind of revision number or something like that), you would be sure that you would never accidentally overwrite another node's edits.
This would allow for your aggregate to either
a) accept the changes in either order (which is generally preferable – makes your system more tolerant), or
b) reject an invalid change
If the aggregate rejects the change, this could be implemented by throwing an exception. And then, in the Rebus handler that catches this exception, you can e.g. await bus.Defer(TimeSpan.FromSeconds(5), theMessage) which will cause it to be delivered again in five seconds.

You should never rely on message order in a service bus / queuing / messaging environment.
When you do find yourself in this position you may need to re-think your design. Firstly, a service bus is most certainly not an event store and attempting to use it like one is going to lead to pain and suffering :) --- not that you are attempting this but I thought I'd throw it in there.
As for your design, in order to manage this kind of state you may want to look at a process manager. If you are not generating those commands then even this will not help.
However, given your scenario it seems as though the calls are sequential but perhaps it is just your example. In any event, as mookid8000 said, you either want to:
discard invalid changes (with the appropriate feedback),
allow any order of messages as long as they are valid,
ignore out-of-sequence messages till later.
Hope that helps...

"exactly the same sequence as they were saved on the bus"
Just... why?
Would you rely on your HTTP server logs to know which command actually reached an aggregate first? No because it is totally unreliable, just like it is with at-least-one delivery guarantees and it's also irrelevant.
It is your event store and/or normal persistence state that should be the source of truth when it comes to knowing the sequence of events. The order of commands shouldn't really matter.
Assuming optimistic concurrency, if the aggregate is not allowed to transition from A to C then it should guard this invariant and when a TransitionToStateC command will hit it in the A state it will simply get rejected.
If on the other hand, A->C->B transitions are valid and that is the order received by your aggregate well that is what happened from the domain perspective. It really shouldn't matter which command was published first on the bus, just like it doesn't matter which user executed the command first from the UI.
"In my scenario the calls for a specific aggregate are absolutely
sequential and I must guarantee that are executed in the same order"
Why are you executing them asynchronously and potentially concurrently by publishing on a bus then? What you are basically saying is that calls are sequential and cannot be processed concurrently. That means everything should be synchronous because there is no potential benefit from parallelism.
Why:
executeAsync(command1)
executeAsync(command2)
executeAsync(command3)
When you want:
execute(command1)
execute(command2)
execute(command3)
You should have a single command message and the handler of this message executes multiple commands against the aggregate. Then again, in this case I'd just create a single operation on the aggregate that performs all the transitions.

Why limit commands and events to one aggregate? CQRS + ES + DDD

Please explain why modifying many aggregates at the same time is a bad idea when doing CQRS, ES and DDD. Is there any situations where it still could be ok?
Take for example a command such as PurgeAllCompletedTodos. I want this command to lead to one event that update the state of each completed Todo-aggregate by setting IsActive to false.
Why is this not good?
One reason I could think of:
When updating the domain state it's probably good to limit the transaction to a well defined part of the entire state so that only this part need to be write locked during the update. Doing so would allow many writes on different aggregates in parallell which could boost performance in some extremely heavy scenarios.

The response of the question lie in the meaning of "aggregate".
As first thing I would say that you are not modifying 'n' aggregates, but you are modifying 'n' entities.
An aggregate contains more-than-one entity and it is just a transaction concept, the aggregate (pattern) is used when you need to modify the state of more than one entity in your application transactionally (all are modified or none).
Now, why you would modify more than one aggregate with one command?
If you feel this needs, before doing anything else check your aggregate boundaries to see if you can modify it to remove the needs to 1 command -> 'n' aggregate.
An aggregate can contains a lot of entities of the same type, so for your command PurgeAllCompletedTodos, you could also think about expand the transaction boundary from a single Todo to an aggregate UserTodosAggregate that contains all the user todos, and let it manage all the commands for the todos of a single user.
In this way you can modify all the todos of a user in a single transaction.
If this still doesn't solve your problem because, let's say that is needed to purge all completed todos of each user in the application, you will still need to send a command to 'n' aggregates, the aggregate boundary doesn't help, so we can think of having an AllApplicationTodosAggregate that manage the command.
Probably this isn't the best solution, because as you said it that command would block ALL the todos of the application, but, always check if it can be a good trade off (this part of the blocking is explained very well in both Blue Book and Red Book of DDD).
What if I need to modify some entities and can't have them in a single aggregate?
With the previous said, a command that modify more than one aggregate is bad because of transactions. What if you modify 3 aggregate, the first is good, and then the server is shut down?
In this case what you are doing is having a lot of single modification that needs to be managed to prevent inconsistency of the system.
It can be done using a process manager, whom responsabilities are modify all the aggregates sending them the right command and manage failures if they happen.
An aggregate still receive it's own command, but the process manager is in charge to send them in a way it knows (one at time, all in parallel, 5 per time, what-do-you-want)
So you can have a strategy to manage the failure between two transaction, and make decision like: "if something fail, roll back all the modification done untill now" (sending a rollback command to each aggregate), or "if an operation fail repeat it 3 times each 30 minutes and if doens't work then rollback", "if something fail create a notification for the system admin".
(sorry for the long post, at least hope it helps)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string