Concurrency issue when processing webhooks - python-3.x

Our application creates/updates database entries based on an external service's webhooks. The webhook sends the external id of the object so that we can fetch more data for processing. The processing of a webhook with roundtrips to get more data is 400-1200ms.
Sometimes, multiple hooks for the same object ID are sent within microseconds of each other. Here are timestamps of the most recent occurrence:
2020-11-21 12:42:45.812317+00:00
2020-11-21 20:03:36.881120+00:00 <-
2020-11-21 20:03:36.881119+00:00 <-
There can also be other objects sent for processing around this time as well. The issue is that concurrent processing of the two hooks highlighted above will create two new database entries for the same single object.
Q: What would be the best way to prevent concurrent processing of the two highlighted entries?
What I've Tried:
Currently, at the start of an incoming hook, I create a database entry in a Changes table which stores the object ID. Right before processing, the Changes table is checked for entries that were created for this ID within the last 10 seconds; if one is found, it quits to let the other process do the work.
In the case above, there were two database entries created, and because they were SO close in time, they both hit the detection spot at the same time, found each other, and quit, resulting in nothing being done.
I've thought of adding some jitter'd timeout before the check (increases processing time), or locking the table (again, increases processing time), but it all feels like I'm fighting the wrong battle.
Any suggestions?
Our API is Django 3.1 with a Postgres db

Okay, this might not be a very satisfactory answer, but it sounds to me like the root of your problem isn't necessarily with your own app, but the webhooks service you are receiving from.
Due to inherent possibility for error in network communication, webhooks which guarantee delivery always use at-least-once semantics. A sender that encounters a failure that leaves receipt uncertain needs to try sending the webhook again, even if the webhook may have been received the first time, thus opening the possibility for a duplicate event.
By extension, all webhook sending services should offer some way of deduplicating an individual event. I help run our webhooks at Stripe, and if you're using those, every webhook sent will come with an event ID like evt_1CiPtv2eZvKYlo2CcUZsDcO6, which a receiver can use for deduplication.
So the right answer for your problem is to ask your sender for some kind of deduplication/idempotency key, because without one, their API is incomplete.
Once you have that, everything gets really easy: you'd create a unique index on that key in the database, and then use upsert to guarantee only a single entry. That would look something like:
CREATE UNIQUE INDEX index_my_table_idempotency_key ON my_table (idempotency_key);
INSERT INTO object_changes (idempotency_key, ...) VALUES ('received-key', ...)
ON CONFLICT (idempotency_key) DO NOTHING;
Second best
Absent an idempotency ID for deduping, all your solutions are going to be hacky, but you could still get something workable together. What you've already suggested of trying to round off the receipt time should mostly work, although it'll still have the possibility of losing two events that were different, but generated close together in time.
Alternatively, you could also try using the entire payload of a received webhook, or better yet, a hash of it, as an idempotency ID:
CREATE UNIQUE INDEX index_my_table_payload_hash ON my_table (payload_hash);
INSERT INTO object_changes (payload_hash, ...) VALUES ('<hash_of_webhook_payload>', ...)
ON CONFLICT (payload_hash) DO NOTHING;
This should keep the field relatively small in the database, while still maintaining accurate deduplication, even for unique events sent close together.
You could also do a combination of the two: a rounded timestamp plus a hashed payload, just in case you were to receive a webhook with an identical payload somewhere down the line. The only thing this wouldn't protect against is two different events sending identical payloads close together in time, which should be a very unlikely case.

If you look at the acquity webhook docs, they supply a field called action, which key to making your webhook idempotent. Here are the quotes I could salvage:
action either scheduled rescheduled canceled changed or order.completed depending on the action that initiated the webhook call
The different actions:
scheduled is called once when an appointment is initially booked
rescheduled is called when the appointment is rescheduled to a new time
canceled is called whenever an appointment is canceled
changed is called when the appointment is changed in any way. This includes when it is initially scheduled, rescheduled, or canceled, as well as when appointment details such as e-mail address or intake forms are updated.
order.completed is called when an order is completed
Based on the wording, I assume that scheduled, canceled, and order.completed are all unique per object_id, which means you can use a unique together constraint for those messages:
class AcquityAction(models.Model):
id = models.CharField(max_length=17, primary_key=True)
class AcquityTransaction(models.Model):
action = models.ForeignKey(AcquityAction, on_delete=models.PROTECT)
object_id = models.IntegerField()
class Meta:
unique_together = [['object_id', 'action_id']]
You can substitute the AcquityAction model for an Enumeration Field if you'd like, but I prefer having them in the DB.
I would ignore the change event entirely, since it appears to trigger on every event, according to their vague definition. For the rescheduled event, I would create a model that allows you to use a unique constraint on the new date, so something like this:
class Reschedule(models.Model):
schedule = models.ForeignKey(MyScheduleModel, on_delete=models.CASCADE)
schedule_date = models.DateTimeField()
class Meta:
unique_together = [['schedule', 'schedule_date']]
Alternatively, you could have a task specifically for updating your schedule model with a rescheduled date, that way it remains idempotent.
Now in your view, you will do something like this:
from django.db import IntegrityError
ACQUITY_ACTIONS = {'scheduled', 'canceled', 'order.completed'}
def webhook_view(request):
validate(request)
action = get_action(request)
if action in ACQUITY_ACTIONS:
try:
insert_transaction()
except IntegrityError:
return HttpResponse(200)
webhook_task.delay()
elif action == 'rescheduled':
other_webhook_task.delay()
...

Related

How to model chat messages in an event-sourced system?

Context: I'm exploring to build an event sourced system / PoC using EventStoreDB (separate event stream per aggregate) with Node.JS/TypeScript. One part of the system is a 1:1 customer support chat. When a chat message is created, a push notification is sent to the user, including an update to the app's badge number (total unread message count). I'm wondering what's the best way to model the aggregates / bounded contexts.
Question 1: where to put the chat messages?
Question 2: how to handle a customer's unread message badge counter?
Since chat messages are by themselves already timed events, they seem like they could easily fit in an event sourced system. Still, I'm looking for advise on how to best model the aggregates:
Option A: Since each chat message has its own lifecycle (they can be edited, have a read status that gets updated, etc.), ChatMessage could be an aggregate on its own. This would explode the number of aggregates (and thus streams), but that might not really be such an issue for EventStoreDB. However, to send the notification for a message, we'll need to know the total number of unread messages (so info on other aggregates). But how should the push notification sending "saga" / "process manager" (which is the correct term?) know what badge counter to send with the notification? Should it keep its own state / read model with the current counter for each customer based on all the event it has seen?
Option B: Another way might be to have a list of messages under the Customer aggregate root. That way, Customer could have a counter for the number of unread messages and a fold of all the events would give me that number. However, here I'm afraid the large number of chat message events for the Customer aggregate root gets in the way of "simple" Customer behavior. E.g. when processing a Customer command, we'd first get the current state by folding all events (assume no snapshotting is used), which means applying all those chat events, even to just do something with the current name of the customer.
Option C: Or should these be in different bounded contexts? So have the Customer with it's contact details in a bounded context, and have a separate bounded context for chat (or communications in general), where both have a Customer aggregate root sharing only the UUID of the customer? Would that be best of both worlds, or would that give other challenges?
Is any of the options the way to go? Or is there another, better option? Or am I just missing the point entirely ;) (don't wanna rule that out)
Any advice is much appreciated!
Event Sourcing describes a way to (re)create state, by storing every change as an event. This does not include how those events get persisted or snapshotted, or how they are read and distributed.
I always start from the User Interface. Because that's where you should know which information you want to display and which actions can be executed.
For example there could be the following Commands (or actions executed by the User Interface):
SendMessage(receiverId, content)
MarkMessageAsRead(messageId)
Your server will then check if the provided data is valid and create the related Events:
class SupportChatMessageAggregate {
MessageId messageId;
UserId senderId;
UserId receiverId;
String content;
boolean readByReceiver;
// depending on framework and personal preference, this could
// also be a method: handle(SendMessage command, CurrentUser currentUser)
constructor(SendMessage command, CurrentUser currentUser) {
validate(command); // throws Exception if invalid
// for example if content is empty,
// or if currentUser is not allowed to send messages to receiverId
publishEvent(new MessageSentEvent(
command.getMessageId(),
currentUser.getUserId(),
command.getReceiverId(),
command.getContent()
));
}
handle(MarkMessageAsRead command, CurrentUser currentUser) {
validate(command); // throws Exception if invalid
// for example check if currentUser == receiver
publishEvent(new MessageMarkedAsReadEvent(
command.getMessageId(),
currentUser.getUserId()
));
}
...
}
Now when you want to know the badge counter for a User, you simply add up all the MessageSentEvents where receiver = currentUser, and subtract all the MessageMarkedAsReadEvents of the currentUser.
This could be done for example within the UnreadSupportChatMessageCountAggregate, that is responsible for providing the current unreadMessages value based on the MessageSentEvents and MessageMarkedAsReadEvents for a given User. A pretty boring Aggregate, but it does the job.
That's Event Sourcing: You simply have a bunch of events, and if you want to query some data, you just fetch all related events, process them, and get your result. If you use separate event streams per aggregate or just have a single stream for all events is an implementation detail (or depends on the event store you use).
Depending on the number of events this can be extremely fast, or very slow. That's where snapshots and/or read models (from CQRS) come in handy. But for plain Event Sourcing this is not required.

DDD Relate Aggregates in a long process running

I am working on a project in which we define two aggregates: "Project" and "Task". The Project, in addition to other attributes, has the points attribute. These points are distributed to the tasks as they are defined by users. In a use case, the user assigns points for some task, but the project must have these points available.
We currently model this as follows:
“task.RequestPoints(points)“, this method will create an aggregate PointsAssignment with attributes points and taskId, which in its constructor issues a PointsAssignmentRequested domain event.
The handler of the event issued will fetch the project related to the task and the aggregate PointsAssigment and call the method “project.assignPoints(pointsAssigment, service)”, that is, it will pass PointAssignment aggregate as a parameter and a service to calculate the difference between the current points of the task and the desired points.
If points are available, the project will modify its points attribute and issue a “ProjectPointsAssigned” domain event that will contain the pointsAssignmentId attribute (in addition to others)
The handler of this last event will fetch the PointsAssingment and confirm “pointsAssigment.Confirm ()”, this aggregate will issue a PointsAssigmentConfirmed domain event
The handler for this last event will bring up the associated task and call “task.AssignPoints (pointsAssignment.points)”
My question is: is it correct to pass in step 2 the aggregate PointsAssignment in the project method? That was the only way I found to be able to relate the aggregates.
Note: We have created the PointsAssignment aggregate so that in case of failure I could save the error “pointsAssignment.Reject(reasonText)” and display it to the user, since I am using eventual consistency (1 aggregate per transaction).
We think about use a Process Manager (PointsAssingmentProcess), but the same way we need the third aggregate PointsAssingment to correlate this process.
I would do it a little bit differently (it doesn´t mean more correct).
Your project doesn´t need to know anything about the PointsAssignment.
If your project is the one that has the available points for use, it can have simple methods of removing or adding points.
RemovePointsCommand -> project->removePoints(points)
AddPointsCommand -> project->addPoints(points)
Then, you would have an eventHandler that would react to the PointsAssignmentRequested (i imagine this guy has the id of the project and the number of points and maybe a status field from what you said)
This eventHandler would only do:
on(PointsAssignmentRequested) -> dispatch command (RemovePointsCommand)
// Note that, in here it would be wise to the client to send an ID for this operation, so it can do it asynchronously.
That command can either success or fail, and both of them can dispatch events:
RemovePointsSucceeded
RemovePointsFailed
// Remember that you have a correlation id from earlier persisted
Then, you would have a final eventHandler that would do:
on(RemovePointsSucceeded) -> PointsAssignment.succeed() //
Dispatches PointsAssignmentSuceeded
on(PointsAssignmentSuceeded) -> task.AssignPoints
(pointsAssignment.points)
On the fail side
on(RemovePointsFailed) -> PointsAssignment.fail() // Dispatches PointsAssignmentFailed
This way you dont have to mix aggregates together, all they know are each others id´s and they can work without knowing anything about the schema of other aggregates, avoiding undesired coupling.
I see the semantics of the this problem exactly as a bank transfer.
You have the bank account (project)
You have money in this bank account(points)
You are transferring money through a transfer process (pointsAssignment)
You are transferring money to an account (task)
The bank account only should have minimal operations, of withdrawing and depositing, it does not need to know anything about the transfer process.
The transfer process need to know from which bank it is withdrawing from and to which account it is depositing to.
I imagine your PointsAssignment being like
{
"projectId":"X",
"taskId":"Y",
"points" : 10,
"status" : ["issued", "succeeded", "failed"]
}

Who and how should handle replaying events?

I am learning about DDD,CQRS and Event-sourcing and there is something I cannot figure out. Commands trigger changes in the aggregates and once the change is performed an event is fired. The event is subsequently handled by other parts of the system and preserved in the event store. However, I do not understand how replaying events would recreate the aggregate, if changes are triggered by commands.
Example: If we have a online shop.
AddItemToCardCommand -> Card Aggregate adds the item to its card -> ItemAddedToCardEvent -> The event is handled by whoever.
However, if the event is replayed, the aggregate would not add the item to its card.
To sum up, my question is how should I recreate aggregates based on the events in the event store? Also, any general advice on how to replay events the right way would be appreaciated.
For simplicity, let's assume a stateless process - our service doesn't try to keep copies of things in memory, but instead reloads aggregates as needed.
The service receives AddItemToCardCommand:{card:123, ...}. We don't have the current state of card:123 in memory, so we need to create it. We do that by loading the state of card:123 from our durable store. Because we chose to use event sourced storage, the "state" we read from the durable store is a representation of the history of events previously written by the service.
Event histories have within them all of the information you need to remember, but not necessarily in a convenient "shape" - append only lists are a great data structure for writes, but not necessarily good for reads.
What this often means is that we will "replay" the events to create an in memory object which we can then use to answer questions about the events we will write next.
The same pattern is used when answering simple queries: we load the history of events from the store, transform the event history into a more convenient shape, and then use that shape to compute the answer.
In circumstances where query latency is more important than timeliness, we might design our query handler to read the convenient shapes from a cache, rather than trying to compute them fresh every time; a concurrently running background thread would be responsible to waking up periodically to compute new contents for the cache.
Using an async process to pull updates from an event stream is a common pattern; Greg Young discusses some of the advantages of that approach in his Polyglot Data talk.
In an ideal event scenario, you would not have an already constructed aggregate structure available in your database. You repeatedly arrive at the end data structure by running through all events stored so far.
Let me illustrate with some pseudocode of adding items to cart, and then fetching the cart data.
# Create a new cart
POST /cart/new
# Store a series of events related to the cart (in database as records, similar to array items)
POST /cart/add -> CartService.AddItem(item_data) -> ItemAddedToCart
A series of events would look like:
* ItemAddedToCart
* ItemAddedToCart
* ItemAddedToCart
* ItemRemovedFromCart
* ItemAddedToCart
When its time to fetch cart data from the DB, you construct a new cart instance (or retrieve a cart instance if persisted) and replay the events on it.
cart = Cart(id=ID1)
# Fetch contents of Cart with id ID1
for each event in ID1 cart's events:
if event is ItemAddedToCart:
cart.add_item(event.data)
else if event is ItemRemovedFromCart:
cart.remove_item(event.data)
return cart
Occasionally, when there are too many events related to the cart, you may want to generate the aggregate structure then and save it in DB. Next time, you can start with the aggregate structure savepoint, and continue applying new events. This optimization helps save time and improve performance when there are too many events to process.
What may help is to not think of the command as changing the state but rather the event as changing the state. In fact, I don't quite see how else one would go about doing so. The command handler in your aggregate would apply the invariants and, if all is OK, would immediately create the event and call some method that would apply it ([Apply|On|Do]MyEvent). The fact that you have an event after the fact does not necessarily mean other parts of your system would handle it. It is however required for event sourcing. Once you have an event you can most certainly pass that on to other parts of your system via, say, publishing on a service bus.
When you replay your events you are calling the same methods that the commands were calling to actually mutate the state of your aggregate:
public MyEvent MyCommand(string data)
{
if (string.IsNullOrWhiteSpace(data))
{
throw new ArgumentException($"Argument '{nameof(data)}' may not be empty.");
}
return On(new MyEvent
{
Data = data
});
}
private MyEvent On(MyEvent myEvent)
{
// change the relevant state
someState = myEvent.Data;
return myEvent;
}
Your event sourcing infrastructure would call On(MyEvent) for MyEvent when replaying. Since you have an event it means that it was a valid state transition and can simply be applied; else something went wrong in your initial command processing and you probably have a bug.
All events in an event store would be in chronological order for an aggregate. In addition to this the events should have a global sequence number to facilitate projection processing.
You could have a generic projection that accepts any/all events and then publishes the event on a service bus for system integration. You could also place that burden on a client of the event store to have it keep track of the position itself and then read events off the store itself. You could combine these and have the client subscribe to service bus events but ensure that it executes them in the same order by keeping track of the position (global sequence number) itself and update it as the events are processed.

DDD handling Aggregate updates over time

Using Event Sourcing, I have a domain in which aggregates should be updated from time to time. When I create an aggregate, I have an expiry time (this can be arbitrary) on it, and after that time I have to update some properties of the entity. (This can be forced using an UpdateCommand too.) I have few processes in mind:
After the aggregate creation, I store the aggregate ID and the expiry time in an RDBMS.
In a cron job I query the database for expired aggregates, and submit an UpdateCommand
Others include emitting UpdateCommands (or events?) from the read side.
Using a saga to coordinate updates, this is similar to the first. But either way, I have to store the expiry times.
So, I have to store the events and write into a database on the write side transactionally. However, I am not sure if creating a read-side for the write-side (?) is the correct solution in the DDD world, or is it applicable? What are the recommended solutions?
I also need to run some commands after some time expires.
For example, I need to emit a ContractExpiredEvent after 1 year (the ContractAggregate decides when but usually it is 1 year). The problem is that the Aggregate must be the one that decides when and what command to executes, so this is a Domain concern more than an Infrastructure one.
How I did that? I was inspired by Udi Dahan's video in which he introduce the term Timeout. Long story short, the Aggregate requests that a command should be send to itself after a period of time passes. It does that by yielding it from a command handler. The underlying CQRS framework gets that scheduled command and persists it in a special repository. Then, a cron job process all scheduled commands when their time comes.
There's well compatibility between ES and DDD.
However, I am not sure if creating a read-side for the write-side (?) is the correct solution in the DDD world, or is it applicable?
Yes, it's a part of domain aggregate in your case (if you talk about storing expiry times on write-side).
So, I have to store the events and write into a database on the write side transactionally.
I suggest you to use the saga for writing into a db.
John Carmack, 1998:
If you don't consider time an input value, think about it until you do -- it is an important concept
The pattern you should be looking for is that the real world (where time is) tells the aggregate the current time, and the aggregate decides whether or not to expire itself.
With that pattern in place, you can use any strategy you like for scheduling when the real world tells the aggregate what time it is.
You don't need immediately consistent scheduling in the aggregate, you just need some idempotent message handling and an "at least once" delivery process.
the aggregate has a method which can cause an update if it is necessary based on the current time, not blindly. At some time I have to fetch the right aggregate from the store, call that method and store the changes back (if any), or retry later, right?
Yes, that's the right idea.
Notice that if you call that method twice after the expiration time, the first call will load the history, append the expiration events, and store the updated history. The second call loads the history, can see that the aggregate is already expired, and retires without making any change to the history.
You can also use bi-temporal event sourcing. When events are stored, there are two dates:
the date when the event is added to the database (createdAt)
the date when the event has to be applied (validFrom)
The events are then applied in the order defined by validFrom property.
Using this, you can:
"fix the past" by adding a new event (createdAt = now and validFrom = now - x)
schedule events in the future by adding a new event (createdAt = now and validFrom = now + y)
I suggest to watch this great video of Thomas Pierrain at DDD Europe 2018: https://www.youtube.com/watch?v=xzekp1RuZbM

node.js + mongo + atomic update of multiple entities = head ache

My setup:
Node.js
Mongojs
A simple database containing two collections - inventory and invoices.
Users may concurrently create invoices.
An invoice may refer to several inventory items.
My problem:
Keeping the inventory integrity. Imagine a scenario were two users submit two invoices with overlapping item sets.
A naive (and wrong) implementation would do the following:
For each item in the invoice read the respective item from the inventory collection.
Fix the quantity of the inventory items.
If any item quantity goes below zero - abandon the request with the relevant message to the user.
Save the inventory items.
Save the invoice.
Obviously, this implementation is bad, because the actions of the two users are going to interleave and affect each other. In a typical blocking server + relational database this is solved with complex locking/transaction schemes.
What is the nodish + mongoish way to solve this? Are there any tools that the node.js platform provides for these kind of things?
You can look at a two phase commit approach with MongoDB, or you can forget about transactions entirely and decouple your processes via a service bus approach. Use Amazon as an example - they will allow you to submit your order, but they will not confirm it until they have been able to secure your inventory item, charged your card, etc. None of this occurs in a single transaction - it is a series of steps that can occur in isolation and can have compensating steps applied where necessary.
A naive bus implementation would do the following (keep in mind that this is just a generic suggestion for you to work from and the exact implementation would depend on your specific needs for concurrency, etc.):
place the order on the queue. At this point, you can
continue to have your client wait, or you can thank them for their
order and let them know they will receive an email when its been
processed.
an "inventory worker" will grab the order and lock the inventory
items that it needs to reserve. This can be done in many different
ways. With Mongo you could create a collection that has a document per orderid. This document would have as its ID the inventory item ID and a TTL that is reasonable
(say 30 seconds). As long as the worker has the lock, then it can
manage the inventory levels of the items it has locks for. Once its
made its changes, it could delete the "lock" document.
If another worker comes along that wants to manage the same item
while its locked, you could put the blocked worker into sleep mode
for X seconds and then retry or, better yet, you could put the
request back onto the message bus to be picked up later by another
worker.
Once the worker has resolved all the inventory items, it then can
place another message on the service bus that indicates a card
should be charged, or processing should receive a notification to
pull the inventory, or an email can be sent to the person who made
the order, etc., etc.
Sounds complex, but once you have a message bus setup, its actually relatively simple. A list of Node Message Bus Implementations can be found here.
Some developers will even skip the formal message bus completely and use a database as their message passing engine which can work in simple implementations. Google Mongo and Queues.
If you don't expect more than 1 server and the message bus implementation is too bulky, node could handle the locking and message passing for you. For example, if you really wanted to lock with node, you could create an array that stored the inventory item IDs. Although, to be frank, I think the message bus is the best way to go. Anyway, here's some code I have used in the past to handle simple external resource locking with Node.
// attempt to take out a lock, if the lock exists, then place the callback into the array.
this.getLock = function( id, cb ) {
if(locks[id] ) {
locks[id].push( cb );
return false;
}
else {
locks[id] = [];
return true;
}
};
// call freelock when done
this.freeLock = function( that, id ) {
async.forEach(locks[id], function(item, callback) {
item.apply( that,[id]);
callback();
}, function(err){
if(err) {
// do something on error
}
locks[id] = null;
});
};

Resources