Event Sourcing and rebuilding logic - domain-driven-design

I am new to event sourcing and i am bit confused about Rebuilding the Objects from the event Stream.
I believe we need to run load all the events happened from the chronological order to rebuild the object state. So for Example
If i have a Object Called customer.
Public class Customer
{
public void Correctname(string firstName,string lastName)
{
CustomerNameChanged(new nameChangedEvent(firstName,lastName);
}
}
If the Customer changed the Name twice we will be storing the event twice in the eventlog and when i rebuild the events to object i will get the event twice . Is it needed to take the previous event or archieve the events so that we dont rerun the last event again

You would re-apply both events to the Customer object. Because you apply them in chronological order the Customer object will be in the correct current state. If you are concerned about the number of events being applied that no longer represent the current state, you should look at Snapshots

When rebuilding an object you process the entire event stream for that object. Performance wise this is usually not an isssue, even for large numbers of events. You can mitigate this by using Rolling Snapshots.
With snapshots you store the state of your object at a particular point of the event stream. Rebuilding is simply loading that snapshot + events that happened after the snapshot was taken.

Related

Event Sourcing - Replaying Events

Imagine an event sourced system where there exists a consuming service that is subscribed to a certain Event A. Once this consumer detects Event A has been emitted in the network, it handles it somehow and dispatches its own Event B.
How would someone replay such a system. Before the replay, both Event A and Event B exist in the event store/database. If we replay Event A and Event B, would this not double count the dispatch of Event B (once being deduced from A and the other being replayed from our event store)? How do you go about replaying events in general when 1 event may cause a cascading chain of other dispatched events.
It is not really a form of replaying the events in the system so that each event is published again and triggers actions. It is more like rehydrating (reconstituting) aggregates from events which are stored in the event store.
The implementation could for instance involve a specific constructor (or factory method) of an aggregate that takes a list of the stored domain events related to the specific aggregate. The aggregate than simply applies those events to mutate it's own state until the current state of the aggregate is reached.
You can take a look at such an implementation in Vaughn Vernons sample Event Sourcing and CQRS project iddd_collaboration. I directly referenced the implementation of a Forum Aggregate which is derived from Vaughn Vernon's implementation of an EventSourcedRootEntity.
You can look into the Forum constructor
public Forum(List<DomainEvent> anEventStream, int aStreamVersion) {
super(anEventStream, aStreamVersion);
}
and the related implementations of the different when() methods and the base class functionalities of EventSourcedRootEntity.
Note: If there is a huge amount of events and performance issues are a concern during aggregate rehydration looking into the concepts of snapshots might also be of your interest.
Events "replay" can easily be handled within the aggregate pattern because applying events does not cause new transactions, but rather the state is rehydrated.
It's important to have only event appliers in the aggregate constructor when it's instantiated out of a list of ordered events.
That's pretty much event sourcing. But there are potential problems when expanding this into event driven architecture (EDA) where an entity/aggregate/microservice/module reacts to an event by initiating another transaction.
In your example, an entity A produces an event A. The entity B reacts to event A by sending a new command, or starting a new transaction that ends up producing an event B.
So right now the event store has event A and event B.
How to ensure a replay or a new read of that stream or all streams doesn't cause a write amplify? Because as soon as event A handlers reads the event won't know if it's the first time it has handled it (and has to initiate the next transaction, command B --> event B, or if it's a replay and doesn't have to do anything about it because it already happened and there is an event B already in the stream.
I am assuming this is your concern, and it's a big one if the event reaction implies making a payment, for example. We wouldn't want to make a new payment each time the event A is handled.
There are a few options:
Never replay events in systems that react to events by creating new transactions. Never replay events unless it's for aggregate instantiation (event sourcing) which uses events just to re-hydrate state, or unless it's for projection/read models that are idempotent or when the projections are being recreated (because DB was dropped for example)
Another option is to react to an event A by appending a command B to a "command stream" (i.e: queue) and have the command handler receive it asynchronously and create the transaction that produces event B. This way, you can rely on the Event store duplicate check and prevent the append of a command if it already exist. The scenario would be similar to this:
A. Transaction A produces event A which is appended to an event store stream
B. Event Handler A reacts to event A and adds a command B to a command stream
C. Command handler B receives the command B and executes transaction that produces an event B appended to the stream.
Now the first time this would work as expected.
If the projections that use event A and event B to write in DB a read model replay events, all is good. It reads event A and then event B.
If the "reactive" event handlers receive event A again, it attempts to append a command B to the command stream. The event/command store detects that command B is a duplicate (optimistic concurrency control using some versioning) and doesn't add it. Command handler B never gets the old command again.
It's important to notice that the processed commands should result in a checkpoint that is never deleted, so that commands are never ever replayed. That's the key.
Probably there are also other mechanisms out there.
You're referring to what's called a "Saga Pattern" and in order to resolve it you need to make your commands explicit. This example helps to illustrate the difference between Commands and Events.
Events are the record of what happened. The are immutable, connected with an entity and describe the original intention of the user.
Commands are a request to do something, which may cause an event to be recorded. They can also cause 'real world' state changes outside of the event-sourced system, but in doing so they should cause an event that records the external change happened.
A few rules will resolve your conundrum:
You cannot record an event without a corresponding command having executed. Every event was caused by a command.
You cannot process commands until the event stream has 'caught up' to the present. Otherwise you are taking action on a partial replay of history.
Back to the Saga Pattern:
In the Saga Pattern, events can lead to more commands. In this way, the system can progress based on a cascade of events and commands and execute a distributed workflow, choreographed by the relations between system state, commands generated, and further events generated.
In your case, as long as you wait for the full event stream to be replayed before issuing the next command, you can then prevent the duplicate cascading event by checking that the action has not already been done.
If event B already exists, there's no need to issue another command to generate event B again.

Who and how should handle replaying events?

I am learning about DDD,CQRS and Event-sourcing and there is something I cannot figure out. Commands trigger changes in the aggregates and once the change is performed an event is fired. The event is subsequently handled by other parts of the system and preserved in the event store. However, I do not understand how replaying events would recreate the aggregate, if changes are triggered by commands.
Example: If we have a online shop.
AddItemToCardCommand -> Card Aggregate adds the item to its card -> ItemAddedToCardEvent -> The event is handled by whoever.
However, if the event is replayed, the aggregate would not add the item to its card.
To sum up, my question is how should I recreate aggregates based on the events in the event store? Also, any general advice on how to replay events the right way would be appreaciated.
For simplicity, let's assume a stateless process - our service doesn't try to keep copies of things in memory, but instead reloads aggregates as needed.
The service receives AddItemToCardCommand:{card:123, ...}. We don't have the current state of card:123 in memory, so we need to create it. We do that by loading the state of card:123 from our durable store. Because we chose to use event sourced storage, the "state" we read from the durable store is a representation of the history of events previously written by the service.
Event histories have within them all of the information you need to remember, but not necessarily in a convenient "shape" - append only lists are a great data structure for writes, but not necessarily good for reads.
What this often means is that we will "replay" the events to create an in memory object which we can then use to answer questions about the events we will write next.
The same pattern is used when answering simple queries: we load the history of events from the store, transform the event history into a more convenient shape, and then use that shape to compute the answer.
In circumstances where query latency is more important than timeliness, we might design our query handler to read the convenient shapes from a cache, rather than trying to compute them fresh every time; a concurrently running background thread would be responsible to waking up periodically to compute new contents for the cache.
Using an async process to pull updates from an event stream is a common pattern; Greg Young discusses some of the advantages of that approach in his Polyglot Data talk.
In an ideal event scenario, you would not have an already constructed aggregate structure available in your database. You repeatedly arrive at the end data structure by running through all events stored so far.
Let me illustrate with some pseudocode of adding items to cart, and then fetching the cart data.
# Create a new cart
POST /cart/new
# Store a series of events related to the cart (in database as records, similar to array items)
POST /cart/add -> CartService.AddItem(item_data) -> ItemAddedToCart
A series of events would look like:
* ItemAddedToCart
* ItemAddedToCart
* ItemAddedToCart
* ItemRemovedFromCart
* ItemAddedToCart
When its time to fetch cart data from the DB, you construct a new cart instance (or retrieve a cart instance if persisted) and replay the events on it.
cart = Cart(id=ID1)
# Fetch contents of Cart with id ID1
for each event in ID1 cart's events:
if event is ItemAddedToCart:
cart.add_item(event.data)
else if event is ItemRemovedFromCart:
cart.remove_item(event.data)
return cart
Occasionally, when there are too many events related to the cart, you may want to generate the aggregate structure then and save it in DB. Next time, you can start with the aggregate structure savepoint, and continue applying new events. This optimization helps save time and improve performance when there are too many events to process.
What may help is to not think of the command as changing the state but rather the event as changing the state. In fact, I don't quite see how else one would go about doing so. The command handler in your aggregate would apply the invariants and, if all is OK, would immediately create the event and call some method that would apply it ([Apply|On|Do]MyEvent). The fact that you have an event after the fact does not necessarily mean other parts of your system would handle it. It is however required for event sourcing. Once you have an event you can most certainly pass that on to other parts of your system via, say, publishing on a service bus.
When you replay your events you are calling the same methods that the commands were calling to actually mutate the state of your aggregate:
public MyEvent MyCommand(string data)
{
if (string.IsNullOrWhiteSpace(data))
{
throw new ArgumentException($"Argument '{nameof(data)}' may not be empty.");
}
return On(new MyEvent
{
Data = data
});
}
private MyEvent On(MyEvent myEvent)
{
// change the relevant state
someState = myEvent.Data;
return myEvent;
}
Your event sourcing infrastructure would call On(MyEvent) for MyEvent when replaying. Since you have an event it means that it was a valid state transition and can simply be applied; else something went wrong in your initial command processing and you probably have a bug.
All events in an event store would be in chronological order for an aggregate. In addition to this the events should have a global sequence number to facilitate projection processing.
You could have a generic projection that accepts any/all events and then publishes the event on a service bus for system integration. You could also place that burden on a client of the event store to have it keep track of the position itself and then read events off the store itself. You could combine these and have the client subscribe to service bus events but ensure that it executes them in the same order by keeping track of the position (global sequence number) itself and update it as the events are processed.

Event Sourcing Refactoring

I've been studying DDD for a while, and stumbled into design patterns like CQRS, and Event sourcing (ES). These patterns can be used to help achieving some concepts of DDD with less effort.
In the architecture exemplified below, the aggregates know how to handle the commands and events related to itself. In other words, the Event Handlers and Command Handlers are the Aggregates.
Then, I’ve started modeling one sample Domain just to understand how the implementation would follow the business logic. For this question here is my domain (It’s based on this):
I know this is a bad modeled example, but I’m using it just as an example.
So, using ES, at the end of the operation, we would save all the events (Green arrows) into the event store (if there were no Exceptions), each event into its given Event Stream (Aggregate Type + Aggregate Id):
Everything seems right until now. So If we want to Rebuild the internal state of an instance of any of this Aggregate, we only have to new it up (new()) and apply all the events saved in its respective Event Stream in the correct order.
My question is related to changes in the model. Because, software development is a process where we never stop learning about our domain, and we always come with new ideas. So, let’s analyze some change scenarios:
Change Scenario 1:
Let´s pretend that now, if the Reservation Aggregate check’s that the seat is not available, it should send an event (Seat not reserved) and this event should be handled by one new Aggregate that will store all people that got their seat not reserved:
In the hypothesis where the old system already handled the initial command (Place order) correctly, and saved all the events to its respective event streams:
When we want to Rebuild the internal state of an instance of any of this Aggregate, we only have to new it up (new()) and apply all the events saved in its respective Event Stream in the correct order. (Nothing changed). The only thing, is that the new Use case didn’t exist back in the old model.
Change Scenario 2:
Let’s pretend that now, when the payment is accepted we handle this event (Payment Accepted) in a new Aggregate (Finance Aggregate) and not in the Order Aggregate anymore. And It send a new Event (Payment Received) to the Order Aggregate. I know this scenario is not well structured, but something like this could happen.
In the hypothesis where the old system already handled the initial command (Place order) correctly, and saved all the events to its respective event streams:
When we want to Rebuild the internal state of an instance of any of this Aggregate, we have a problem when applying the events from the Aggregate Event Stream to itself:
Now, the order doesn’t know anymore how to handle Payment Accepted Event.
Problems
So as the examples showed, whenever a system change reflects in an event being handled by a different event handler (Aggregate), there are some major problems. Because, we cannot rebuild the internal state anymore.
So, this problem can have some solutions:
Possible Solution
When an event is not handled by the aggregate in which Event Stream it is stored, we can find the new handler and create a new instance and send the event to it. But to maintain the internal state correct, we need the last event (Payment Received) to be handled by the Order Aggregate. So, we let it dispatch the event (and possible commands):
This solution can have some problems. Let’s imagine that a new command (Place Order) arrives and it has to create this order instance and save the new state. Now we would have:
In gray are the events that were already saved in the last call when the system hadn’t already gone through model changes.
We can see that a new Event Stream is created for the new aggregate (Finance W). And we can see that Event Streams are append-only, so the Payment Accepted event in the Order Y Event Stream is still there.
The first Payment Accepted event in Finance W Event Stream is the one that was supposed to be handled by the Order but had to find a new handler.
The Yellow payment received event in Order’s Event Stream is the event that was generated by the new handler of the Payment Accepted when the Payment Accepted event from the Order’s Event Stream was handled by the Finance.
All the other Green Events are new events that were generated by handling the Place Order Command in the new model.
Problem With the Solution
The next time the aggregate needs to be rebuild, there will be a Payment Accepted event in the stream (because it is append-only), and it will again call the new handler, but this have already been done and the Payment Received event have already been saved to the stream. So, it is not necessary to go through this again, we could ignore this event and continue.
Question
So, my question is how can we handle with model changes that impact who handle each event? How can we rebuild the internal state of an Aggregate after a change like this?
Will we need to build some event Stream migration that changes the events from one stream to the new schema (one or more streams)? Just like we would need in a Relational database?
Will we never be allowed to remove one handler, so we can only add new handlers? This would lead to unmanageable system…
You got almost all right, except one thing: Aggregates should not handle events from other Aggregates. It's like a non-event-sourced Aggregate shares a table with another Aggregate: they should not.
In event-driven DDD, Aggregates are the system's building blocks that receive Commands (things that express the intent) and return Events (things that had happened). For every Command type must exist one and only one Aggregate type that handle it. Before executing a Command, the Aggregate is fed with all its own previously emitted Events, that is, every Event that was emitted in the past by this Aggregate instance is applied to this Aggregate instance, in the chronological order.
So, if you want to correctly model your system, you are not allowed to send events from one Aggregate as events to another Aggregate (a different type or instance).
If you need to model business processes that involve multiple Aggregates, the correct way of doing it is by using a Saga/Process manager. This is a different component. It is the opposite of an Aggregate.
It receive Events emitted by Aggregates and sends Commands to other Aggregates.
In simplest cases, a Saga manager simply takes properties from one Event and creates+populates a Command with those properties. Then it sends the Command to the destination Aggregate.
In more complicated cases, the Saga waits for multiple Events and when all are received only then it creates and sends a Command.
The Saga may also deduplicate or reorder events.
In your case, a Saga could be Sale, whose purpose would be to coordinate the entire sales process, from ordering to product dispatching.
In conclusion, you have that problem because you have not modeled correctly your system. If your Aggregates would have handled only their specific Commands (and not somebody else's Events) then even if you must create a new Saga when a new Business process emerges, it would send the same Command to the Same Aggregate.
Answering briefly
my question is how can we handle with model changes that impact who handle each event?
Handling events is generally an easy thing to change, because the handling part is ephemeral. Events have a single writer, but they can have many readers. You just need to arrange for the plumbing to notify each subscriber of the event.
So in scenario #1, its the PaymentAggregate that writes down the PaymentAccepted event (in its own stream), and then your plumbing notifies the OrderAggregate that the PaymentAccepted event happened, and it does the next thing in its own logic.
To change to scenario #2, we'd leave the Payment Aggregate unchanged, but we'd arrange the plumbing so that it tells the FinanceAggregate about PaymentAccepted, and that it tells the OrderAggregate about PaymentReceived.
Your pictures make it hard to see this; I think you aren't being careful to track that each change of state is stored in the stream of the aggregate that changed. Not your fault - the Microsoft picture is really awful.
In other words, your arrow #3 "Seats Reserved" isn't a SeatsReserved event, it's a Handle(SeatsReserved) command.

Event Sourcing: proper way of rolling back aggregate state

I'm looking for an advice related to the proper way of implementing a rollback feature in a CQRS/event-sourcing application.
This application allows to a group of editors to edit and update some editorial content, an editorial news for instance. We implemented the user interface so that each field has an auto save feature and now we would like to provide our users the possibility to undo the operations they did, so that it is possible to rollback the editorial news to a previous known state.
Basically we would like to implement something like to the undo command that you have in Microsoft Word and similar text editors. In the backend, the editorial news is an instance of an aggregate defined in our domain and called Story.
We have discussed some ideas to implement the rollback and we are looking for an advice based on real world experiences in similar projects. Here is our considerations about this feature.
How rollback works in real world business domains
First of all, we all know that in real world business domains what we are calling rollback is obtained via some form of compensation event.
Imagine a domain related to some sort of service for which it is possible to buy a subscription: we could have an aggregate representing a user subscription and an event describing that a charge has been associated to an instance of the aggregate (the particular subscription of one of the customers). A possible implementation of the event is as follows:
public class ChargeAssociatedToSubscriptionEvent: DomainEvent
{
public Guid SubscriptionId {get; set;}
public decimal Amount {get; set;}
public string Description {get; set;}
public DateTime DueDate {get; set;}
}
If a charge is wrongly associated to a subscription, it is possible to fix the error by means of an accreditation associated to the same subscription and having the same amount, so that the effect of the charge is completely balanced and the user get back its money. In other words, we could define the following compensation event:
public class AccreditationAssociatedToSubscription: DomainEvent
{
public Guid SubscriptionId {get; set;}
public decimal Amount {get; set;}
public string Description {get; set;}
public DateTime AccreditationDate {get; set;}
}
So if a user is wrongly charged for an amount of 50 dollars, we can compensate the error by means of an accreditation of 50 dollars to the user subscription: this way the state of the aggregate has been rolled back to the previous state.
Why things are not as easy as they seem
Based on the previous discussion, the rollback seems quite easy to be implemented. If you have an instance of the story aggregate at the aggregate revision B and you want to roll it back to a previous aggregate revision, say A (with A < B), you just have to do the following steps:
check the event store and get all the events between revisions A and B
compute the compensation event for each of the occurred events
apply the compensation events to the aggregate in the reverse order
Unfortunately, the second step of the previous procedure is not always possible: given a generic domain event it is not always possible to compute its compensation event, because the amount of information contained inside the event could not be enough to do that. Maybe it is possible to wisely define all the events so that they contain enough information to be able to compute the corresponding compensation event, but at the current state of our application there are several events for which computing the compensation event is not possible and we would prefer to avoid changing the shape of our events.
A possible solution based on state comparison
The first idea to overcome the issues with compensation event is computing the minimum set of events needed to roll back the aggregate by comparing the current state of the aggregate with the target state. The algorithm is basically the following:
get an instance of the aggregate at the current state (call it B)
get an instance of the aggregate at the target state (call it A) by applying only the first n events persisted inside event store (our repository allows to do that by specifying the aggregate id and the desired point in time to which materialize the aggregate)
compare the two instances and compute the minimum set of events to be applied to the aggregate in the state B in order to change its state to A
apply the computed events to the aggregate
A smarter approach based on event replay
Another way to solve the problem of rolling back to a previous state of the aggregate could be doing the same thing that the aggregate repository does when an aggregate is materialized at a specific point in time. In order to do that we should define an event, say StoryResettedEvent, whose effect is to reset the state of the aggregate by completely emptying it and do the following steps:
apply the StoryResettedEvent to our aggregate so that its state is emptied
get the first n events for the aggregate we are working on (all the events from the first saved event up to the target state A)
apply all the events to the aggregate instance
The main problem I see with this approach is the event to empty the state of the aggregate: it seems somewhat artificial, not a real domain event with a business meaning, but rather a trick to implement the rollback functionality.
The third way: persisting the compensation event each time an event is saved inside the event store
The third way we figured out to get what we need is based again on the concept of compensation event. The basic idea is that each event of the application could be enriched with a property containing the corresponding compensation event.
In the point of the code where an event is raised it is possible to immediately compute the compensation event for the event to be raised (based on the current state of the aggregate and the shape of the event), so that the event could be enriched with this information that this way will be saved inside the event store. By doing so the compensation events events are always available, ready to be used in case of a rollback request. The downside of this solution is that each domain event must be modified and only a minimum part of the compensation events we must compute and save inside the event store will be useful for an actual rollback (most of them will never be used).
Conclusions
In my opinion the best option to solve the problem is using the algorithm based on state comparison (the first proposed solution), but we are still evaluating what to do.
Does anyone have already had a similar requirement ? Is there any other way to implement a rollback ? Are we completely missing the point and following bad approaches to the problem ?
Thanks for helping, any advice will be appreciated.
How the compensation events are generated should be the concern of the Story aggregate (after all, that's the point of an aggregate in event sourcing - it's just the validator of commands and generator of events for a particular stream).
Presumably you are following something like a typical CQRS/ES flow:
client sends an Undo command, which presumably says what version it wants to undo back to, and what story it is targetting
The Undo Command Handler loads the Story aggregate in the usual way, either possibly from a snapshot and/or by applying the aggregate's events to the aggregate.
In some way, the command is passed to the aggregate (possibly a method call with args extracted from the command, or just passing the command directly to the aggregate)
The aggregate "returns" in some way the events to persist, assuming the undo command is valid. These are the compensating events.
compute the compensation event for each of the occurred events
...
Unfortunately, the second step of the previous procedure is not always possible
Why not? The aggregate has been passed all previous events, so what does it need that it doesn't have? The aggregate doesn't just see the events you want to roll back, it necessarily processes all events for that aggregate ever.
You have two options really - reduce the book-keeping that the aggregate needs to do by having the command handler help out in some way, or the whole process is managed internally by the aggregate.
Command handler helps out:
The command handler extracts from the command the version the user wants to roll back to, and then recreates the aggregate as-of that version (applying events in the usual way), in addition to creating the current aggregate. Then the old aggregate gets passed to the aggregate's undo method along with the command, so that the aggregate can then do state comparison more easily.
You might consider this to be a bit hacky, but it seems moderately harmless, and could significantly simplify the aggregate code.
Aggregate is on its own:
As events are applied to the aggregate, it adds to its state whatever book-keeping it needs to be able to compute the compensating events if it receives an undo command. This could be a map of compensating events, pre-computed, a list of every previous state that can potentially be reverted to (to allow state comparison), the list of events the aggregate has processed (so it can compute the previous state itself in the undo method), or whatever it needs, and it just stores it in its in-memory state (and snapshot state, if applicable).
The main concern with the aggregate doing it on its own is performance - if the size of the book-keeping state is large, the simplification of allowing the command handler to pass the previous state would be worthwhile. In any case, you should be able to switch between the approaches at any time in the future without any issues (except possibly needing to rebuild your snapshots, if you have them).
My 2 cents.
For rollback operation, an orchestration class will be responsible to handle it. It will publish a aggregate_modify_generated event and a projection on the other end for this event will fetch the current state of the aggregates after receiving it. Now when any of the aggregate failed, it should generate a failure event, upon receiving it, orchestration class will generate a aggregate_modify_rollback event that will received by that projection and will set aggregate state with the previously fetched state .
One common projector can do the task, because the events will have aggregate id.

Data Repository using MemoryCache

I built a homebrew data entity repository with a factory that defines retention policy by type (e.g. absolute or sliding expiration). The policy also specifies the cache type as httpcontext request, session, or application. A MemoryCache is maintained by a caching proxy in all 3 cache types. Anyhow, I have a data entity service tied to the repository which does the load and save for our primary data entity. The idea is you use the entity repository and don't need to care if the entity is cached or retrieved from it's data source (db in this case).
An obvious assumption would be that you would need to synchronise the load/save events as you would need to save the cached entity before loading the entity from it's data source.
So I was investigating a data integrity issue in production today... :)
Today I read there can be a good long gap between the entity being removed from the MemoryCache and the CacheItemRemovedCallback event firing (default 20 seconds). The simple lock I had around the load and save data ops was insufficient. Furthermore the CacheItemRemovedCallback was in it's own context outside of HttpContext making things interesting. It meant I needed to make the callback function static as I was potentially assigning a disposed instance to the event.
So once I realised there was was the possibility of a gap whereby my data entity no longer existed in cache but might not have been saved to it's data source might explain the 3 corrupt orders out of 5000. While filling out a long form it would be easy to perform work beyond the policy's 20 minute sliding expiration on the primary data entity. That means if they happen to submit at the same moment of expiration an interesting race condition between the load (via request context) and save (via cache expired callback) emerges.
With a simple lock it was the roll of the dice, would save or load win? Clearly we need a save before the next load from the data source (db). Ideally when an item expires from the cache it is atomically written to it's data source. with the entity gone from the cache but the expired callback not yet fired a load operation can slip in. In this case the entity will not be found in the cache so will default to load from the data source. However, as the save operation may not have commenced resulting in data integrity corruption and will likely clobber your now saved cached data.
To accomplish synchronisation I need a named signalling lock so I settled on EventWaitHandle. A named lock is created per user which is < 5000. This allows the Load to wait on a signal from the expired event which Saves the entity (whose thread exists in its own context outside HttpContext). So in the save it is easy to grab the existing name handle and signal the Load to continue once the Save is complete.
I also have a redundancy where it times out and logs each 10 seconds block by the save operation. As I said, the default is meant to be 20 seconds between an entity being removed form MemoryCache and it being conscious of it to fire the event which in turn saves the entity.
Thank you to anyone who followed my ramblings through all that. Given the nature of the sync requirements was the EventWaitHandle lock the best solution?
For completeness I wanted to post what I did to address the issue. I made multiple changes to the design to create a tidier solution which did not require a named sync object and allowed me to use a simple lock instead.
First the data entity repository is a singleton which was stored in the request cache. This front end of the repository is detached from the cache's themselves. I changed it to reside in the session cache instead which becomes important below.
Second I changed the event for the expired entity to route through the data entity repository above.
Third I changed the MemoryCache event from RemovedCallback to UpdateCallback**.
Last, we tie it all together with a regular lock in the data entity repository which is is the user's session and the gap-less expiry event routing through the same allowing the lock to cover load and save (expire) operations.
** These events are funny in that A) you can't subscribe to both and B) UpdateCallback is called before the item is removed from the cache but it is not called when you explicitly remove the item (aka myCache.Remove(entity) won't call event but UpdateCallback will). We made the decision if the item was being forcefully removed from the cache that we didn't care. This happens when the user changes company or clears their shopping list. So these scenarios won't fire the event so the entity may never be saved to the DB's cache tables. While it might have been nice for debugging purposes it wasn't worth dealing with the limbo state of an entity's existence to use the RemovedCallback which had 100% coverage.

Resources