How to model chat messages in an event-sourced system?

How to model chat messages in an event-sourced system? - domain-driven-design

Context: I'm exploring to build an event sourced system / PoC using EventStoreDB (separate event stream per aggregate) with Node.JS/TypeScript. One part of the system is a 1:1 customer support chat. When a chat message is created, a push notification is sent to the user, including an update to the app's badge number (total unread message count). I'm wondering what's the best way to model the aggregates / bounded contexts.
Question 1: where to put the chat messages?
Question 2: how to handle a customer's unread message badge counter?
Since chat messages are by themselves already timed events, they seem like they could easily fit in an event sourced system. Still, I'm looking for advise on how to best model the aggregates:
Option A: Since each chat message has its own lifecycle (they can be edited, have a read status that gets updated, etc.), ChatMessage could be an aggregate on its own. This would explode the number of aggregates (and thus streams), but that might not really be such an issue for EventStoreDB. However, to send the notification for a message, we'll need to know the total number of unread messages (so info on other aggregates). But how should the push notification sending "saga" / "process manager" (which is the correct term?) know what badge counter to send with the notification? Should it keep its own state / read model with the current counter for each customer based on all the event it has seen?
Option B: Another way might be to have a list of messages under the Customer aggregate root. That way, Customer could have a counter for the number of unread messages and a fold of all the events would give me that number. However, here I'm afraid the large number of chat message events for the Customer aggregate root gets in the way of "simple" Customer behavior. E.g. when processing a Customer command, we'd first get the current state by folding all events (assume no snapshotting is used), which means applying all those chat events, even to just do something with the current name of the customer.
Option C: Or should these be in different bounded contexts? So have the Customer with it's contact details in a bounded context, and have a separate bounded context for chat (or communications in general), where both have a Customer aggregate root sharing only the UUID of the customer? Would that be best of both worlds, or would that give other challenges?
Is any of the options the way to go? Or is there another, better option? Or am I just missing the point entirely ;) (don't wanna rule that out)
Any advice is much appreciated!

Event Sourcing describes a way to (re)create state, by storing every change as an event. This does not include how those events get persisted or snapshotted, or how they are read and distributed.
I always start from the User Interface. Because that's where you should know which information you want to display and which actions can be executed.
For example there could be the following Commands (or actions executed by the User Interface):
SendMessage(receiverId, content)
MarkMessageAsRead(messageId)
Your server will then check if the provided data is valid and create the related Events:
class SupportChatMessageAggregate {
MessageId messageId;
UserId senderId;
UserId receiverId;
String content;
boolean readByReceiver;
// depending on framework and personal preference, this could
// also be a method: handle(SendMessage command, CurrentUser currentUser)
constructor(SendMessage command, CurrentUser currentUser) {
validate(command); // throws Exception if invalid
// for example if content is empty,
// or if currentUser is not allowed to send messages to receiverId
publishEvent(new MessageSentEvent(
command.getMessageId(),
currentUser.getUserId(),
command.getReceiverId(),
command.getContent()
));
}
handle(MarkMessageAsRead command, CurrentUser currentUser) {
validate(command); // throws Exception if invalid
// for example check if currentUser == receiver
publishEvent(new MessageMarkedAsReadEvent(
command.getMessageId(),
currentUser.getUserId()
));
}
...
}
Now when you want to know the badge counter for a User, you simply add up all the MessageSentEvents where receiver = currentUser, and subtract all the MessageMarkedAsReadEvents of the currentUser.
This could be done for example within the UnreadSupportChatMessageCountAggregate, that is responsible for providing the current unreadMessages value based on the MessageSentEvents and MessageMarkedAsReadEvents for a given User. A pretty boring Aggregate, but it does the job.
That's Event Sourcing: You simply have a bunch of events, and if you want to query some data, you just fetch all related events, process them, and get your result. If you use separate event streams per aggregate or just have a single stream for all events is an implementation detail (or depends on the event store you use).
Depending on the number of events this can be extremely fast, or very slow. That's where snapshots and/or read models (from CQRS) come in handy. But for plain Event Sourcing this is not required.

Related

Concurrency issue when processing webhooks

Our application creates/updates database entries based on an external service's webhooks. The webhook sends the external id of the object so that we can fetch more data for processing. The processing of a webhook with roundtrips to get more data is 400-1200ms.
Sometimes, multiple hooks for the same object ID are sent within microseconds of each other. Here are timestamps of the most recent occurrence:
2020-11-21 12:42:45.812317+00:00
2020-11-21 20:03:36.881120+00:00 <-
2020-11-21 20:03:36.881119+00:00 <-
There can also be other objects sent for processing around this time as well. The issue is that concurrent processing of the two hooks highlighted above will create two new database entries for the same single object.
Q: What would be the best way to prevent concurrent processing of the two highlighted entries?
What I've Tried:
Currently, at the start of an incoming hook, I create a database entry in a Changes table which stores the object ID. Right before processing, the Changes table is checked for entries that were created for this ID within the last 10 seconds; if one is found, it quits to let the other process do the work.
In the case above, there were two database entries created, and because they were SO close in time, they both hit the detection spot at the same time, found each other, and quit, resulting in nothing being done.
I've thought of adding some jitter'd timeout before the check (increases processing time), or locking the table (again, increases processing time), but it all feels like I'm fighting the wrong battle.
Any suggestions?
Our API is Django 3.1 with a Postgres db

Okay, this might not be a very satisfactory answer, but it sounds to me like the root of your problem isn't necessarily with your own app, but the webhooks service you are receiving from.
Due to inherent possibility for error in network communication, webhooks which guarantee delivery always use at-least-once semantics. A sender that encounters a failure that leaves receipt uncertain needs to try sending the webhook again, even if the webhook may have been received the first time, thus opening the possibility for a duplicate event.
By extension, all webhook sending services should offer some way of deduplicating an individual event. I help run our webhooks at Stripe, and if you're using those, every webhook sent will come with an event ID like evt_1CiPtv2eZvKYlo2CcUZsDcO6, which a receiver can use for deduplication.
So the right answer for your problem is to ask your sender for some kind of deduplication/idempotency key, because without one, their API is incomplete.
Once you have that, everything gets really easy: you'd create a unique index on that key in the database, and then use upsert to guarantee only a single entry. That would look something like:
CREATE UNIQUE INDEX index_my_table_idempotency_key ON my_table (idempotency_key);
INSERT INTO object_changes (idempotency_key, ...) VALUES ('received-key', ...)
ON CONFLICT (idempotency_key) DO NOTHING;
Second best
Absent an idempotency ID for deduping, all your solutions are going to be hacky, but you could still get something workable together. What you've already suggested of trying to round off the receipt time should mostly work, although it'll still have the possibility of losing two events that were different, but generated close together in time.
Alternatively, you could also try using the entire payload of a received webhook, or better yet, a hash of it, as an idempotency ID:
CREATE UNIQUE INDEX index_my_table_payload_hash ON my_table (payload_hash);
INSERT INTO object_changes (payload_hash, ...) VALUES ('<hash_of_webhook_payload>', ...)
ON CONFLICT (payload_hash) DO NOTHING;
This should keep the field relatively small in the database, while still maintaining accurate deduplication, even for unique events sent close together.
You could also do a combination of the two: a rounded timestamp plus a hashed payload, just in case you were to receive a webhook with an identical payload somewhere down the line. The only thing this wouldn't protect against is two different events sending identical payloads close together in time, which should be a very unlikely case.

If you look at the acquity webhook docs, they supply a field called action, which key to making your webhook idempotent. Here are the quotes I could salvage:
action either scheduled rescheduled canceled changed or order.completed depending on the action that initiated the webhook call
The different actions:
scheduled is called once when an appointment is initially booked
rescheduled is called when the appointment is rescheduled to a new time
canceled is called whenever an appointment is canceled
changed is called when the appointment is changed in any way. This includes when it is initially scheduled, rescheduled, or canceled, as well as when appointment details such as e-mail address or intake forms are updated.
order.completed is called when an order is completed
Based on the wording, I assume that scheduled, canceled, and order.completed are all unique per object_id, which means you can use a unique together constraint for those messages:
class AcquityAction(models.Model):
id = models.CharField(max_length=17, primary_key=True)
class AcquityTransaction(models.Model):
action = models.ForeignKey(AcquityAction, on_delete=models.PROTECT)
object_id = models.IntegerField()
class Meta:
unique_together = [['object_id', 'action_id']]
You can substitute the AcquityAction model for an Enumeration Field if you'd like, but I prefer having them in the DB.
I would ignore the change event entirely, since it appears to trigger on every event, according to their vague definition. For the rescheduled event, I would create a model that allows you to use a unique constraint on the new date, so something like this:
class Reschedule(models.Model):
schedule = models.ForeignKey(MyScheduleModel, on_delete=models.CASCADE)
schedule_date = models.DateTimeField()
class Meta:
unique_together = [['schedule', 'schedule_date']]
Alternatively, you could have a task specifically for updating your schedule model with a rescheduled date, that way it remains idempotent.
Now in your view, you will do something like this:
from django.db import IntegrityError
ACQUITY_ACTIONS = {'scheduled', 'canceled', 'order.completed'}
def webhook_view(request):
validate(request)
action = get_action(request)
if action in ACQUITY_ACTIONS:
try:
insert_transaction()
except IntegrityError:
return HttpResponse(200)
webhook_task.delay()
elif action == 'rescheduled':
other_webhook_task.delay()
...

Who and how should handle replaying events?

I am learning about DDD,CQRS and Event-sourcing and there is something I cannot figure out. Commands trigger changes in the aggregates and once the change is performed an event is fired. The event is subsequently handled by other parts of the system and preserved in the event store. However, I do not understand how replaying events would recreate the aggregate, if changes are triggered by commands.
Example: If we have a online shop.
AddItemToCardCommand -> Card Aggregate adds the item to its card -> ItemAddedToCardEvent -> The event is handled by whoever.
However, if the event is replayed, the aggregate would not add the item to its card.
To sum up, my question is how should I recreate aggregates based on the events in the event store? Also, any general advice on how to replay events the right way would be appreaciated.

For simplicity, let's assume a stateless process - our service doesn't try to keep copies of things in memory, but instead reloads aggregates as needed.
The service receives AddItemToCardCommand:{card:123, ...}. We don't have the current state of card:123 in memory, so we need to create it. We do that by loading the state of card:123 from our durable store. Because we chose to use event sourced storage, the "state" we read from the durable store is a representation of the history of events previously written by the service.
Event histories have within them all of the information you need to remember, but not necessarily in a convenient "shape" - append only lists are a great data structure for writes, but not necessarily good for reads.
What this often means is that we will "replay" the events to create an in memory object which we can then use to answer questions about the events we will write next.
The same pattern is used when answering simple queries: we load the history of events from the store, transform the event history into a more convenient shape, and then use that shape to compute the answer.
In circumstances where query latency is more important than timeliness, we might design our query handler to read the convenient shapes from a cache, rather than trying to compute them fresh every time; a concurrently running background thread would be responsible to waking up periodically to compute new contents for the cache.
Using an async process to pull updates from an event stream is a common pattern; Greg Young discusses some of the advantages of that approach in his Polyglot Data talk.

In an ideal event scenario, you would not have an already constructed aggregate structure available in your database. You repeatedly arrive at the end data structure by running through all events stored so far.
Let me illustrate with some pseudocode of adding items to cart, and then fetching the cart data.
# Create a new cart
POST /cart/new
# Store a series of events related to the cart (in database as records, similar to array items)
POST /cart/add -> CartService.AddItem(item_data) -> ItemAddedToCart
A series of events would look like:
* ItemAddedToCart
* ItemAddedToCart
* ItemAddedToCart
* ItemRemovedFromCart
* ItemAddedToCart
When its time to fetch cart data from the DB, you construct a new cart instance (or retrieve a cart instance if persisted) and replay the events on it.
cart = Cart(id=ID1)
# Fetch contents of Cart with id ID1
for each event in ID1 cart's events:
if event is ItemAddedToCart:
cart.add_item(event.data)
else if event is ItemRemovedFromCart:
cart.remove_item(event.data)
return cart
Occasionally, when there are too many events related to the cart, you may want to generate the aggregate structure then and save it in DB. Next time, you can start with the aggregate structure savepoint, and continue applying new events. This optimization helps save time and improve performance when there are too many events to process.

What may help is to not think of the command as changing the state but rather the event as changing the state. In fact, I don't quite see how else one would go about doing so. The command handler in your aggregate would apply the invariants and, if all is OK, would immediately create the event and call some method that would apply it ([Apply|On|Do]MyEvent). The fact that you have an event after the fact does not necessarily mean other parts of your system would handle it. It is however required for event sourcing. Once you have an event you can most certainly pass that on to other parts of your system via, say, publishing on a service bus.
When you replay your events you are calling the same methods that the commands were calling to actually mutate the state of your aggregate:
public MyEvent MyCommand(string data)
{
if (string.IsNullOrWhiteSpace(data))
{
throw new ArgumentException($"Argument '{nameof(data)}' may not be empty.");
}
return On(new MyEvent
{
Data = data
});
}
private MyEvent On(MyEvent myEvent)
{
// change the relevant state
someState = myEvent.Data;
return myEvent;
}
Your event sourcing infrastructure would call On(MyEvent) for MyEvent when replaying. Since you have an event it means that it was a valid state transition and can simply be applied; else something went wrong in your initial command processing and you probably have a bug.
All events in an event store would be in chronological order for an aggregate. In addition to this the events should have a global sequence number to facilitate projection processing.
You could have a generic projection that accepts any/all events and then publishes the event on a service bus for system integration. You could also place that burden on a client of the event store to have it keep track of the position itself and then read events off the store itself. You could combine these and have the client subscribe to service bus events but ensure that it executes them in the same order by keeping track of the position (global sequence number) itself and update it as the events are processed.

Event Sourcing Refactoring

I've been studying DDD for a while, and stumbled into design patterns like CQRS, and Event sourcing (ES). These patterns can be used to help achieving some concepts of DDD with less effort.
In the architecture exemplified below, the aggregates know how to handle the commands and events related to itself. In other words, the Event Handlers and Command Handlers are the Aggregates.
Then, I’ve started modeling one sample Domain just to understand how the implementation would follow the business logic. For this question here is my domain (It’s based on this):
I know this is a bad modeled example, but I’m using it just as an example.
So, using ES, at the end of the operation, we would save all the events (Green arrows) into the event store (if there were no Exceptions), each event into its given Event Stream (Aggregate Type + Aggregate Id):
Everything seems right until now. So If we want to Rebuild the internal state of an instance of any of this Aggregate, we only have to new it up (new()) and apply all the events saved in its respective Event Stream in the correct order.
My question is related to changes in the model. Because, software development is a process where we never stop learning about our domain, and we always come with new ideas. So, let’s analyze some change scenarios:
Change Scenario 1:
Let´s pretend that now, if the Reservation Aggregate check’s that the seat is not available, it should send an event (Seat not reserved) and this event should be handled by one new Aggregate that will store all people that got their seat not reserved:
In the hypothesis where the old system already handled the initial command (Place order) correctly, and saved all the events to its respective event streams:
When we want to Rebuild the internal state of an instance of any of this Aggregate, we only have to new it up (new()) and apply all the events saved in its respective Event Stream in the correct order. (Nothing changed). The only thing, is that the new Use case didn’t exist back in the old model.
Change Scenario 2:
Let’s pretend that now, when the payment is accepted we handle this event (Payment Accepted) in a new Aggregate (Finance Aggregate) and not in the Order Aggregate anymore. And It send a new Event (Payment Received) to the Order Aggregate. I know this scenario is not well structured, but something like this could happen.
In the hypothesis where the old system already handled the initial command (Place order) correctly, and saved all the events to its respective event streams:
When we want to Rebuild the internal state of an instance of any of this Aggregate, we have a problem when applying the events from the Aggregate Event Stream to itself:
Now, the order doesn’t know anymore how to handle Payment Accepted Event.
Problems
So as the examples showed, whenever a system change reflects in an event being handled by a different event handler (Aggregate), there are some major problems. Because, we cannot rebuild the internal state anymore.
So, this problem can have some solutions:
Possible Solution
When an event is not handled by the aggregate in which Event Stream it is stored, we can find the new handler and create a new instance and send the event to it. But to maintain the internal state correct, we need the last event (Payment Received) to be handled by the Order Aggregate. So, we let it dispatch the event (and possible commands):
This solution can have some problems. Let’s imagine that a new command (Place Order) arrives and it has to create this order instance and save the new state. Now we would have:
In gray are the events that were already saved in the last call when the system hadn’t already gone through model changes.
We can see that a new Event Stream is created for the new aggregate (Finance W). And we can see that Event Streams are append-only, so the Payment Accepted event in the Order Y Event Stream is still there.
The first Payment Accepted event in Finance W Event Stream is the one that was supposed to be handled by the Order but had to find a new handler.
The Yellow payment received event in Order’s Event Stream is the event that was generated by the new handler of the Payment Accepted when the Payment Accepted event from the Order’s Event Stream was handled by the Finance.
All the other Green Events are new events that were generated by handling the Place Order Command in the new model.
Problem With the Solution
The next time the aggregate needs to be rebuild, there will be a Payment Accepted event in the stream (because it is append-only), and it will again call the new handler, but this have already been done and the Payment Received event have already been saved to the stream. So, it is not necessary to go through this again, we could ignore this event and continue.
Question
So, my question is how can we handle with model changes that impact who handle each event? How can we rebuild the internal state of an Aggregate after a change like this?
Will we need to build some event Stream migration that changes the events from one stream to the new schema (one or more streams)? Just like we would need in a Relational database?
Will we never be allowed to remove one handler, so we can only add new handlers? This would lead to unmanageable system…

You got almost all right, except one thing: Aggregates should not handle events from other Aggregates. It's like a non-event-sourced Aggregate shares a table with another Aggregate: they should not.
In event-driven DDD, Aggregates are the system's building blocks that receive Commands (things that express the intent) and return Events (things that had happened). For every Command type must exist one and only one Aggregate type that handle it. Before executing a Command, the Aggregate is fed with all its own previously emitted Events, that is, every Event that was emitted in the past by this Aggregate instance is applied to this Aggregate instance, in the chronological order.
So, if you want to correctly model your system, you are not allowed to send events from one Aggregate as events to another Aggregate (a different type or instance).
If you need to model business processes that involve multiple Aggregates, the correct way of doing it is by using a Saga/Process manager. This is a different component. It is the opposite of an Aggregate.
It receive Events emitted by Aggregates and sends Commands to other Aggregates.
In simplest cases, a Saga manager simply takes properties from one Event and creates+populates a Command with those properties. Then it sends the Command to the destination Aggregate.
In more complicated cases, the Saga waits for multiple Events and when all are received only then it creates and sends a Command.
The Saga may also deduplicate or reorder events.
In your case, a Saga could be Sale, whose purpose would be to coordinate the entire sales process, from ordering to product dispatching.
In conclusion, you have that problem because you have not modeled correctly your system. If your Aggregates would have handled only their specific Commands (and not somebody else's Events) then even if you must create a new Saga when a new Business process emerges, it would send the same Command to the Same Aggregate.

Answering briefly
my question is how can we handle with model changes that impact who handle each event?
Handling events is generally an easy thing to change, because the handling part is ephemeral. Events have a single writer, but they can have many readers. You just need to arrange for the plumbing to notify each subscriber of the event.
So in scenario #1, its the PaymentAggregate that writes down the PaymentAccepted event (in its own stream), and then your plumbing notifies the OrderAggregate that the PaymentAccepted event happened, and it does the next thing in its own logic.
To change to scenario #2, we'd leave the Payment Aggregate unchanged, but we'd arrange the plumbing so that it tells the FinanceAggregate about PaymentAccepted, and that it tells the OrderAggregate about PaymentReceived.
Your pictures make it hard to see this; I think you aren't being careful to track that each change of state is stored in the stream of the aggregate that changed. Not your fault - the Microsoft picture is really awful.
In other words, your arrow #3 "Seats Reserved" isn't a SeatsReserved event, it's a Handle(SeatsReserved) command.

How to solve persistence delay or error while handling event in CQRS and ES?

I am trying to implement another DDD bounded context with CQRS and ES.
I wonder, given there is CreateUserCommand that creates User in my domain model (not a word about saving). Then it fires UserCreatedEvent.
I have two event handlers for that event:
PersistUserEventHandler (updates state of app) and
SendWelcomeEmailEventHandler (sends welcome email to user)
Now, I know, that:
Order of processing event in Event Handlers should not matter
Saving state should be detail, because source of truth is in my event store.
But, what if I do not want to send welcome email until my read model is fully updated? Because, what if for example process is delayed or some error occurs and I am not able to persist that user into read model right now? Then I do not want send that welcome email now, because if user clicked to for example link to his profile in mail, he would see "user does not exists".
I saw people are persisting changes through repository directly in command handlers (which would solve this problem), but that does not make sense with Event Sourcing, because I want to be able to replay all events (with event handlers for persisting only to prevent all other side effects) and get actual state of application in persistence layer.
Or should I listen to UserCreatedEvent only with event handler that actually persists it into read model and then raise in this event handler another event CreatedUserSavedEvent and all emails etc. would have been sent by their handlers?
I suppose NO too, because it reminds me some event hell and also if I get EventBus into some event handler, I am getting into circular reference problem which is just effect of that I am violating rule that every depencency should point down to lower components of my system and not the other side.
So, how is this usualy solved or am I missing something?

PersistUserEventHandler (updates state of app)
You might be mistaking Read Models for a homogeneous whole that accurately represents the current state of an application, i.e. a second source of absolute truth besides the event log.
I tend to see them more as a bunch of partial, opinionated parcels of state that may not all be updated at the same time and may reflect different truths.
I don't recommend taking read models as a source of data in another context than the use case they were designed for. In your example, SendWelcomeEmail should probably not rely on the User read model but only on the data contained in the UserCreated event.
Now you can share code between read model projectors and other types of event handlers to avoid duplication, but sharing data seems risky.

If users have random UUID then it should not be a problem. If a user arrive at an url and the readmodel is not up to date then you could show a "loading in progress,please wait" message.
If you really want to know if the user really exists - for example you want to see the difference between "user does not exists" and "read model is not sunchronized yet" then you could send a special command that don't generate any events (or just test a command if your command dispatcher supports dry running of commands) and throw exception if user does not exist.

Implement facebook style status message system in mongodb

How can we implement a Facebook like status message system in mongodb (using mongoose), where whenever any given user posts his status it gets broadcasted on all his friends timeline.
It doesn't have to be real-time, there will be a refresh button to get the latest statuses.
here is what I have come up with:
Plan A:
status(collection)
id, user_id(reference), status_msg
Benefit: faster write speed
Plan B:
status(collection)
id, user_id(reference), status_msg, friends_list[sub-document]
Benefit: faster read speed
With plan A, I'll have to loop through all the friends a user has in his friends list and then fetch all the status.
I'll have to do this every time (page refresh/ new login) for every single friend.
With Plan B, I'll only have to fetch the statuses which has the current user in the friends_list.
I would like to know your opinion and suggestion on this ?
Is there any better way of approaching this problem ?
I would also like to know how I can use rabbitMQ here to increase the efficiency and lower the unnecessary db i/o .

Assuming that each user will likely have several friends, and these friends refresh their timeline several times a day, you can assume that reading will happen much more frequently than writing. That means from a pure performance standpoint you would optimize for read-access, not for write-access, and store the receivers with the message.
However, keep the semantics in mind. What if the friend-list of the author changes after they posted a status message?
Do you want the message to disappear from the timelines of any ex-friends?
Do you want the message to appear in the timeline of any new friends they make?
When the answers to these questions are yes, you should rather determine the receivers on read than on write.
There is also a third option which might be worth considering: Do not handle messages by sender, handle them by receiver. When someone posts a message, create an individual copy of the message for each of their friends and save them as separate documents. You can then get all messages for a user by querying your messages collection for messages where they are the receiver. The friend/unfriend operation would then need to check for any messages which need to be added/removed. The major drawback of this approach would be that users with a very high number of friends would create a very high write-load when posting something.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string