How to process Read Model in CQRS - domain-driven-design

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.

If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.

For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

Related

DDD\CQRS\ES Aggregate rehydration

When command emits more than one event, how to ensure correct rehydration. what is correct way mark many events as atomic change
The rehydration is complete, when all available events are applied. Though, when your command emits multiple events, you just have to ensure that those events get persisted together in an atomic operation.
Example:
Current Event Stream:
1. MyEvent1
Then execute a command that emits multiple events:
MyCommand emits ->
MyEvent1
MyEvent2
These Events will get appended to the Event Stream as an atomic operation.
New Event Stream:
1. MyEvent1
2. MyEvent1
3. MyEvent2
Now, when rehydrating the Aggregate, you just read the entire Event Stream til the end, and you're done.
When command emits more than one event, how to ensure correct rehydration. what is correct way mark many events as atomic change
Typically
Your domain logic returns a sequence of events.
You atomically write the entire sequence into the event history
When reconstituting, you use the event history
Note that the second step implies that your durable event storage will support the atomic write of an event sequence.
Note: we don't usually rely on the broadcast mechanism when reconstituting the state of our domain model, but instead use the history.
In the CQRS world, that would include creating the read model by loading the history, and then playing the events in the fixed order in which they were written.
It depends on your underlying storage. A common way to implement atomic writes for persistent storage that does not support transactions is to create a batch of events and write that as a single operation.
When rehydrating the events you can flatten the batches to a flat sequence/stream of events to build your current state. During rehydration you really care about all events, since you will always apply the next command on fully hydrated current state. Thus, there is no point in keeping the batches at that point.
There are a number of solutions out there that support writing/rehydration of state like this if you don't feel like building your own.
Event store (https://eventstore.com)
Axon server (https://axoniq.io)
Serialized (https://serialized.io) for full disclosure - this is our product
Good luck

How are the missing events replayed?

I am trying to learn more about CQRS and Event Sourcing (Event Store).
My understanding is that a message queue/bus is not normally used in this scenario - a message bus can be used to facilitate communication between Microservices, however it is not typically used specifically for CQRS. However, the way I see it at the moment - a message bus would be very useful guaranteeing that the read model is eventually in sync hence eventual consistency e.g. when the server hosting the read model database is brought back online.
I understand that eventual consistency is often acceptable with CQRS. My question is; how does the read side know it is out of sync with the write side? For example, lets say there are 2,000,000 events created in Event Store on a typical day and 1,999,050 are also written to the read store. The remaining 950 events are not written because of a software bug somewhere or because the server hosting the read model is offline for a few secondsetc. How does eventual consistency work here? How does the application know to replay the 950 events that are missing at the end of the day or the x events that were missed because of the downtime ten minutes ago?
I have read questions on here over the last week or so, which talk about messages being replayed from event store e.g. this one: CQRS - Event replay for read side, however none talk about how this is done. Do I need to setup a scheduled task that runs once per day and replays all events that were created since the date the scheduled task last succeeded? Is there a more elegant approach?
I've used two approaches in my projects, depending on the requirements:
Synchronous, in-process Readmodels. After the events are persisted, in the same request lifetime, in the same process, the Readmodels are fed with those events. In case of a Readmodel's failure (bug or catchable error/exception) the error is logged and that Readmodel is just skipped and the next Readmodel is fed with the events and so on. Then follow the Sagas, that may generate commands that generate more events and the cycle is repeated.
I use this approach when the impact of a Readmodel's failure is acceptable by the business, when the readiness of a Readmodel's data is more important than the risk of failure. For example, they wanted the data immediately available in the UI.
The error log should be easily accessible on some admin panel so someone would look at it in case a client reports inconsistency between write/commands and read/query.
This also works if you have your Readmodels coupled to each other, i.e. one Readmodel needs data from another canonical Readmodel. Although this seems bad, it's not, it always depends. There are cases when you trade updater code/logic duplication with resilience.
Asynchronous, in-another-process readmodel updater. This is used when I use total separation of the Readmodel from the other Readmodels, when a Readmodel's failure would not bring the whole read-side down; or when a Readmodel needs another language, different from the monolith. Basically this is a microservice. When something bad happens inside a Readmodel it necessary that some authoritative higher level component is notified, i.e. an Admin is notified by email or SMS or whatever.
The Readmodel should also have a status panel, with all kinds of metrics about the events that it has processed, if there are gaps, if there are errors or warnings; it also should have a command panel where an Admin could rebuild it at any time, preferable without a system downtime.
In any approach, the Readmodels should be easily rebuildable.
How would you choose between a pull approach and a push approach? Would you use a message queue with a push (events)
I prefer the pull based approach because:
it does not use another stateful component like a message queue, another thing that must be managed, that consume resources and that can (so it will) fail
every Readmodel consumes the events at the rate it wants
every Readmodel can easily change at any moment what event types it consumes
every Readmodel can easily at any time be rebuild by requesting all the events from the beginning
there order of events is exactly the same as the source of truth because you pull from the source of truth
There are cases when I would choose a message queue:
you need the events to be available even if the Event store is not
you need competitive/paralel consumers
you don't want to track what messages you consume; as they are consumed they are removed automatically from the queue
This talk from Greg Young may help.
How does the application know to replay the 950 events that are missing at the end of the day or the x events that were missed because of the downtime ten minutes ago?
So there are two different approaches here.
One is perhaps simpler than you expect - each time you need to rebuild a read model, just start from event 0 in the stream.
Yeah, the scale on that will eventually suck, so you won't want that to be your first strategy. But notice that it does work.
For updates with not-so-embarassing scaling properties, the usual idea is that the read model tracks meta data about stream position used to construct the previous model. Thus, the query from the read model becomes "What has happened since event #1,999,050"?
In the case of event store, the call might look something like
EventStore.ReadStreamEventsForwardAsync(stream, 1999050, 100, false)
Application doesn't know it hasn't processed some events due to a bug.
First of all, I don't understand why you assume that the number of events written on the write side must equal number of events processed by read side. Some projections may subscribe to the same event and some events may have no subscriptions on the read side.
In case of a bug in projection / infrastructure that resulted in a certain projection being invalid you might need to rebuild this projection. In most cases this would be a manual intervention that would reset the checkpoint of projection to 0 (begining of time) so the projection will pick up all events from event store from scratch and reprocess all of them again.
The event store should have a global sequence number across all events starting, say, at 1.
Each projection has a position tracking where it is along the sequence number. The projections are like logical queues.
You can clear a projection's data and reset the position back to 0 and it should be rebuilt.
In your case the projection fails for some reason, like the server going offline, at position 1,999,050 but when the server starts up again it will continue from this point.

Should latest event version be queried in event sourcing?

I am developing a simple DDD + Event sourcing based app for educational purposes.
In order to set event version before storing to event store I should query event store but my gut tells that this is wrong because it causes concurrency issues.
Am I missing something?
There are different answers to that, depending on what use case you are considering.
Generally, the event store is a dumb, domain agnostic appliance. It's superficially similar to a List abstraction -- it stores what you put in it, but it doesn't actually do any work to satisfy your domain constraints.
In use cases where your event stream is just a durable record of things that have happened (meaning: your domain model does not get a veto; recording the event doesn't depend on previously recorded events), then append semantics are fine, and depending on the kind of appliance you are using, you may not need to know what position in the stream you are writing to.
For instance: the API for GetEventStore understands ExpectedVersion.ANY to mean append these events to the end of the stream wherever it happens to be.
In cases where you do care about previous events (the domain model is expected to ensure an invariant based on its previous state), then you need to do something to ensure that you are appending the event to the same history that you have checked. The most common implementations of this communicate the expected position of the write cursor in the stream, so that the appliance can reject attempts to write to the wrong place (which protects you from concurrent modification).
This doesn't necessarily mean that you need to be query the event store to get the position. You are allowed to count the number of events in the stream when you load it, and to remember how many more events you've added, and therefore where the stream "should" be if you are still synchronized with it.
What we're doing here is analogous to a compare-and-swap operation: we get a representation of the original state of the stream, create a new representation, and then compare and swap the reference to the original to point instead to our changes
oldState = stream.get()
newState = domainLogic(oldState)
stream.compareAndSwap(oldState, newState)
But because a stream is a persistent data structure with append only semantics, we can use a simplified API that doesn't require duplicating the existing state.
events = stream.get()
changes = domainLogic(events)
stream.appendAt(count(events), changes)
If the API of your appliance doesn't allow you to specify a write position, then yes - there's the danger of a data race when some other writer changes the position of the stream between your query for the position and your attempt to write. Data obtained in a query is always stale; unless you hold a lock you can't be sure that the data hasn't changed at the source while you are reading your local copy.
I guess you shouldn't to think about event version.
If you talk about the place in the event stream, in general, there's no guaranteed way to determine it at the creation moment, only in processing time or in event-storage.
If it is exactly about event version (see http://cqrs.nu/Faq, How do I version/upgrade my events?), you have it hardcoded in your application. So, I mean next use case:
First, you have an app generating some events. Next, you update app and events are changed (you add some fields or change payload structure) but kept logical meaning. So, now you have old events in your ES, and new events, that differ significantly from old. And to distinguish one from another you use event version, eg 0 and 1.

Event sourcing: in-memory read model, does it make sense to apply events during read?

I'm working on an application that uses event sourcing for storing data. For many reasons a persisted read model is not used, therefore we need to implement an in-memory read approach.
The idea is that some data from an aggregate should be read with the current date, but also with past dates. Therefore the approach I'm thinking is to process all the events of that aggregate and Apply the event in the aggregate in order to have the state at the specified date.
But, I'm reading in various articles that the Apply method should only be used when the command is emittend and not during the reading phase.
Does it make sense to use the Apply method during the reading phase? Or maybe the same logic should be replicated in a different method (some sort of processor)?
I'm not posting any code here because what I'm looking for is to understand what the right approach should be and to understand better how to user Event Sourcing.
Reading the internal and private state of an aggregate should be avoided as it break it's encapsulation. Also, an Aggregate stores whatever state it needs in order to check for its invariants and querying that state would force the Aggregate to maintain additional state that it is not needed.
In conclusion, avoid reading the write side.
I don't think that it is necessarily the case that the Apply method may only be called when emitting events. If the purpose of the Apply method is to load the event into the aggregate then it will be invoked when restoring the aggregate to its current state.
It seems as though you need processing for your aggregate that would represent it for some window period (start/end date). It may be more appropriate to create a read model that represents the bits that you are interested in and store them as time-sensitive data and then rather query that.
If the processing is too intensive you may even want to issue a command that requests a report or view of sorts but that would still work through the time-sensitive read model and then produces the required output. Once done the user can be notified that the results are ready for perusal.
As #Constantin Galbenu stated: you should avoid your aggregate's internal state and this ties in with not querying your domain model.

EventSourcing race condition

Here is the nice article which describes what is ES and how to deal with it.
Everything is fine there, but one image is bothering me. Here it is
I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.
It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.
I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?
All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.
The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.
In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).
The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.
This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.
It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.
It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.
Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.
There are going to be a couple of ways to deal with such a scenario.
First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.
Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.
Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.
I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.
Hope that helps :)

Resources