Why limit commands and events to one aggregate? CQRS + ES + DDD - domain-driven-design

Please explain why modifying many aggregates at the same time is a bad idea when doing CQRS, ES and DDD. Is there any situations where it still could be ok?
Take for example a command such as PurgeAllCompletedTodos. I want this command to lead to one event that update the state of each completed Todo-aggregate by setting IsActive to false.
Why is this not good?
One reason I could think of:
When updating the domain state it's probably good to limit the transaction to a well defined part of the entire state so that only this part need to be write locked during the update. Doing so would allow many writes on different aggregates in parallell which could boost performance in some extremely heavy scenarios.

The response of the question lie in the meaning of "aggregate".
As first thing I would say that you are not modifying 'n' aggregates, but you are modifying 'n' entities.
An aggregate contains more-than-one entity and it is just a transaction concept, the aggregate (pattern) is used when you need to modify the state of more than one entity in your application transactionally (all are modified or none).
Now, why you would modify more than one aggregate with one command?
If you feel this needs, before doing anything else check your aggregate boundaries to see if you can modify it to remove the needs to 1 command -> 'n' aggregate.
An aggregate can contains a lot of entities of the same type, so for your command PurgeAllCompletedTodos, you could also think about expand the transaction boundary from a single Todo to an aggregate UserTodosAggregate that contains all the user todos, and let it manage all the commands for the todos of a single user.
In this way you can modify all the todos of a user in a single transaction.
If this still doesn't solve your problem because, let's say that is needed to purge all completed todos of each user in the application, you will still need to send a command to 'n' aggregates, the aggregate boundary doesn't help, so we can think of having an AllApplicationTodosAggregate that manage the command.
Probably this isn't the best solution, because as you said it that command would block ALL the todos of the application, but, always check if it can be a good trade off (this part of the blocking is explained very well in both Blue Book and Red Book of DDD).
What if I need to modify some entities and can't have them in a single aggregate?
With the previous said, a command that modify more than one aggregate is bad because of transactions. What if you modify 3 aggregate, the first is good, and then the server is shut down?
In this case what you are doing is having a lot of single modification that needs to be managed to prevent inconsistency of the system.
It can be done using a process manager, whom responsabilities are modify all the aggregates sending them the right command and manage failures if they happen.
An aggregate still receive it's own command, but the process manager is in charge to send them in a way it knows (one at time, all in parallel, 5 per time, what-do-you-want)
So you can have a strategy to manage the failure between two transaction, and make decision like: "if something fail, roll back all the modification done untill now" (sending a rollback command to each aggregate), or "if an operation fail repeat it 3 times each 30 minutes and if doens't work then rollback", "if something fail create a notification for the system admin".
(sorry for the long post, at least hope it helps)

Related

How to process Read Model in CQRS

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.
If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.
For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

DDD - How to modify several AR (from different bounded contexts) throughout single request?

I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.
How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".
Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊

DDD handling Aggregate updates over time

Using Event Sourcing, I have a domain in which aggregates should be updated from time to time. When I create an aggregate, I have an expiry time (this can be arbitrary) on it, and after that time I have to update some properties of the entity. (This can be forced using an UpdateCommand too.) I have few processes in mind:
After the aggregate creation, I store the aggregate ID and the expiry time in an RDBMS.
In a cron job I query the database for expired aggregates, and submit an UpdateCommand
Others include emitting UpdateCommands (or events?) from the read side.
Using a saga to coordinate updates, this is similar to the first. But either way, I have to store the expiry times.
So, I have to store the events and write into a database on the write side transactionally. However, I am not sure if creating a read-side for the write-side (?) is the correct solution in the DDD world, or is it applicable? What are the recommended solutions?
I also need to run some commands after some time expires.
For example, I need to emit a ContractExpiredEvent after 1 year (the ContractAggregate decides when but usually it is 1 year). The problem is that the Aggregate must be the one that decides when and what command to executes, so this is a Domain concern more than an Infrastructure one.
How I did that? I was inspired by Udi Dahan's video in which he introduce the term Timeout. Long story short, the Aggregate requests that a command should be send to itself after a period of time passes. It does that by yielding it from a command handler. The underlying CQRS framework gets that scheduled command and persists it in a special repository. Then, a cron job process all scheduled commands when their time comes.
There's well compatibility between ES and DDD.
However, I am not sure if creating a read-side for the write-side (?) is the correct solution in the DDD world, or is it applicable?
Yes, it's a part of domain aggregate in your case (if you talk about storing expiry times on write-side).
So, I have to store the events and write into a database on the write side transactionally.
I suggest you to use the saga for writing into a db.
John Carmack, 1998:
If you don't consider time an input value, think about it until you do -- it is an important concept
The pattern you should be looking for is that the real world (where time is) tells the aggregate the current time, and the aggregate decides whether or not to expire itself.
With that pattern in place, you can use any strategy you like for scheduling when the real world tells the aggregate what time it is.
You don't need immediately consistent scheduling in the aggregate, you just need some idempotent message handling and an "at least once" delivery process.
the aggregate has a method which can cause an update if it is necessary based on the current time, not blindly. At some time I have to fetch the right aggregate from the store, call that method and store the changes back (if any), or retry later, right?
Yes, that's the right idea.
Notice that if you call that method twice after the expiration time, the first call will load the history, append the expiration events, and store the updated history. The second call loads the history, can see that the aggregate is already expired, and retires without making any change to the history.
You can also use bi-temporal event sourcing. When events are stored, there are two dates:
the date when the event is added to the database (createdAt)
the date when the event has to be applied (validFrom)
The events are then applied in the order defined by validFrom property.
Using this, you can:
"fix the past" by adding a new event (createdAt = now and validFrom = now - x)
schedule events in the future by adding a new event (createdAt = now and validFrom = now + y)
I suggest to watch this great video of Thomas Pierrain at DDD Europe 2018: https://www.youtube.com/watch?v=xzekp1RuZbM

EventSourcing race condition

Here is the nice article which describes what is ES and how to deal with it.
Everything is fine there, but one image is bothering me. Here it is
I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.
It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.
I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?
All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.
The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.
In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).
The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.
This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.
It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.
It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.
Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.
There are going to be a couple of ways to deal with such a scenario.
First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.
Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.
Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.
I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.
Hope that helps :)

How to deal with Command which is depend on existing records in application using CQRS and Event sourcing

We are using CQRS with EventSourcing.
In our application we can add resources(it is business term for a single item) from ui and we are sending command accordingly to add resources.
So we have x number of resources present in application which were added previously.
Now, we have one special type of resource(I am calling it as SpecialResource).
When we add this SpecialResource , id needs to be linked with all existing resources in application.
Linked means this SpecialResource should have List of ids(guids) (List)of existing resources.
The solution which we tried to get all resource ids in applcation before adding the special
resource(i.e before firing the AddSpecialResource command).
Assign these List to SpecialResource, Then send AddSpecialResource command.
But we are not suppose to do so , because as per cqrs command should not query.
I.e. command cant depend upon query as query can have stale records.
How can we achieve this business scenario without querying existing records in application?
But we are not suppose to do so , because as per cqrs command should not query. I.e. command cant depend upon query as query can have stale records.
This isn't quite right.
"Commands" run queries all the time. If you are using event sourcing, in most cases your commands are queries -- "if this command were permitted, what events would be generated?"
The difference between this, and the situation you described, is the aggregate boundary, which in an event sourced domain is a fancy name for the event stream. An aggregate is allowed to run a query against its own event stream (which is to say, its own state) when processing a command. It's the other aggregates (event streams) that are out of bounds.
In practical terms, this means that if SpecialResource really does need to be transactionally consistent with the other resource ids, then all of that data needs to be part of the same aggregate, and therefore part of the same event stream, and everything from that point is pretty straight forward.
So if you have been modeling the resources with separate streams up to this point, and now you need SpecialResource to work as you have described, then you have a fairly significant change to your domain model to do.
The good news: that's probably not your real requirement. Consider what you have described so far - if resourceId:99652 is created one millisecond before SpecialResource, then it should be included in the state of SpecialResource, but if it is created one millisecond after, then it shouldn't. So what's the cost to the business if the resource created one millisecond before the SpecialResource is missed?
Because, a priori, that doesn't sound like something that should be too expensive.
More commonly, the real requirement looks something more like "SpecialResource needs to include all of the resource ids created prior to close of business", but you don't actually need SpecialResource until 5 minutes after close of business. In other words, you've got an SLA here, and you can use that SLA to better inform your command.
How can we achieve this business scenario without querying existing records in application?
Turn it around; run the query, copy the results of the query (the resource ids) into the command that creates SpecialResource, then dispatch the command to be passed to your domain model. The CreateSpecialResource command includes within it the correct list of resource ids, so the aggregate doesn't need to worry about how to discover that information.
It is hard to tell what your database is capable of, but the most consistent way of adding a "snapshot" is at the database layer, because there is no other common place in pure CQRS for that. (There are some articles on doing CQRS+ES snapshots, if that is what you actually try to achieve with SpecialResource).
One way may be to materialize list of ids using some kind of stored procedure with the arrival of AddSpecialResource command (at the database).
Another way is to capture "all existing resources (up to the moment)" with some marker (timestamp), never delete old resources, and add "SpecialResource" condition in the queries, which will use the SpecialResource data.
Ok, one more option (depends on your case at hand) is to always have the list of ids handy with the same query, which served the UI. This way the definition of "all resources" changes to "all resources as seen by the user (at some moment)".
I do not think any computer system is ever going to be 100% consistent simply because life does not, and can not, work like this. Apparently we are all also living in the past since it takes time for your brain to process input.
The point is that you do the best you can with the information at hand but ensure that your system is able to smooth out any edges. So if you need to associate one or two resources with your SpecialResource then you should be able to do so.
So even if you could associate your SpecialResource with all existing entries in your data store what is to say that there isn't another resource that has not yet been entered into the system that also needs to be associated.
It all, as usual, will depend on your specific use-case. This is why process managers, along with their state, enable one to massage that state until the process can complete.
I hope I didn't misinterpret your question :)
You can do two things in order to solve that problem:
make a distinction between write and read model. You know what read model is, right? So "write model" of data in contrast is a combination of data structures and behaviors that is just enough to enforce all invariants and generate consistent event(s) as a result of every executed command.
don't take a rule which states "Event Store is a single source of truth" too literally. Consider the following interpretation: ES is a single source of ALL truth for your application, however, for each specific command you can create "write models" which will provide just enough "truth" in order to make this command consistent.

Resources