Domain driven design and domain events - domain-driven-design

I'm new to DDD and I'm reading articles now to get more information. One of the articles focuses on domain events (DE). For example sending email is a domain event raised after some criteria is met while executing piece of code.
Code example shows one way of handling domain events and is followed by this paragraph
Please be aware that the above code will be run on the same thread within the same transaction as the regular domain work so you should avoid performing any blocking activities, like using SMTP or web services. Instead, prefer using one-way messaging to communicate to something else which does those blocking activities.
My questions are
Is this a general problem in handling DE? Or it is just concern of the solution in mentioned article?
If domain events are raised in transaction and the system will not handle them synchronously, how should they be handled?
When I decide to serialize these events and let scheduler (or any other mechanism) execute them, what happens when transaction is rolled back? (in the article event is raised in code executed in transaction) who will cancel them (when they are not persisted to database)?
Thanks

It's a general problem period never mind DDD
In general, in any system which is required to respond in a performant manner (e.g. a Web Server, any long running activities should be handled asynchronously to the triggering process.
This means queue.
Rolling back your transaction should remove item from the queue.
Of course, you now need additional mechanisms to handle the situation where the item on the queue fails to process - i.e the email isn't sent - you also need to allow for this in your triggering code - having a subsequent process RELY on the earlier process having already occurred is going to cause issues at some point.
In short, your queueing mechanism should itself be transactional and allow for retries and you need to think about the whole chain of events as a workflow.

Related

What's a good design for making sure the Node.js Event Loop isn't blocked when adding potentially hundreds of records?

I just read this article from Node.js: Don't Block the Event Loop
The Ask
I'm hoping that someone can read over the use case I describe below and tell me whether or not I'm understanding how the event loop is blocked, and whether or not I'm doing it. Also, any tips on how I can find this information out for myself would be useful.
My use case
I think I have a use case in my application that could potentially cause problems. I have a functionality which enables a group to add members to their roster. Each member that doesn't represent an existing system user (the common case) gets an account created, including a dummy password.
The password is hashed with argon2 (using the default hash type), which means that even before I get to the need to wait on a DB promise to resolve (with a Prisma transaction) that I have to wait for each member's password to be generated.
I'm using Prisma for the ORM and Sendgrid for the email service and no other external packages.
A take-away that I get from the article is that this is blocking the event loop. Since there could potentially be hundreds of records generated (such as importing contacts from a CSV or cloud contact service), this seems significant.
To sum up what the route in question does, including some details omitted before:
Remove duplicates (requires one DB request & then some synchronous checking)
Check remaining for existing user
For non-existing users:
Synchronously create many records & push each to a separate array. One of these records requires async password generation for each non-existing user
Once the arrays are populated, send a DB transaction with all records
Once the transaction is cleared, create invitation records for each member
Once the invitation records are created, send emails in a MailData[] through SendGrid.
Clearly, there are quite a few tasks that must be done sequentially. If it matters, the asynchronous functions are also nested: createUsers calls createInvites calls sendEmails. In fact, from the controller, there is: updateRoster calls createUsers calls createInvites calls sendEmails.
There are architectural patterns that are aimed at avoiding issues brought by potentially long-running operations. Note here that while your example is specific, any long running process would possibly be harmful here.
The first obvious pattern is the cluster. If your app is handled by multiple concurrent independent event-loops of a cluster, blocking one, ten or even thousand of loops could be insignificant if your app is scaled to handle this.
Imagine an example scenario where you have 10 concurrent loops, one is blocked for a longer time but 9 remaining are still serving short requests. Chances are, users would not even notice the temporary bottleneck caused by the one long running request.
Another more general pattern is a separated long-running process service or the Command-Query Responsibility Segregation (I'm bringing the CQRS into attention here as the pattern description could introduce more interesting ideas you could be not familiar with).
In this approach, some long-running operations are not handled directly by backend servers. Instead, backend servers use a Message Queue to send requests to yet another service layer of your app, the layer that is solely dedicated to running specific long-running requests. The Message Queue is configured so that it has specific throughput so that if there are multiple long-running requests in short time, they are queued, so that possibly some of them are delayed but your resources are always under control. The backend that sends requests to the Message Queue doesn't wait synchronously, instead you need another form of return communication.
This auxiliary process service can be maintained and scaled independently. The important part here is that the service is never accessed directly from the frontend, it's always behind a message queue with controlled throughput.
Note that while the second approach is often implemented in real-life systems and it solves most issues, it can still be incapable of handling some edge cases, e.g. when long-running requests come faster than they are handled and the queue grows infintely.
Such cases require careful maintenance and you either scale your app to handle the traffic or you introduce other rules that prevent users from running long processes too often.

How to handle publishing event when message broker is out?

I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.

Should I put command bus between controller and domain service?

I am working on a backend and try to implement CQRS patterns.
I'm pretty clear about events, but sometimes struggle with commands.
I've seen that commands are requested by users, or example ChangePasswordCommand. However in implementation level user is just calling an endpoint, handled by some controller.
I can inject an UserService to my controller, which will handle domain logic and this is how basic tutorials do (I use Nest.js). However I feel that maybe this is where I should use command - so should I execute command ChangePasswordCommand in my controller and then domain module will handle it?
Important thing is that I need return value from the command, which is not a problem from implementation perspective, but it doesn't look good in terms of CQRS - I should ADD and GET at the same time.
Or maybe the last option is to execute the command in controller and then emit an event (PasswordChangedEvent) in command handler. Next, wait till event comes back and return the value in controller.
This last option seems quite good to me, but I have problems with clear implementation inside request lifecycle.
I base on
https://docs.nestjs.com/recipes/cqrs
While the answer by #cperson is technically correct, I would like to add a few nuances to it.
First something that may not be clear from the answer description where it advises to "emit an event (PasswordChangedEvent) in command handler". This is what I would prefer as well, but watch out:
The Command is part of the infrastructure layer, and the Event is part of the domain.
So from the command you should trigger code on the AggregateRoot that emits the event.
This can be done with mergeObjectContext or eventBus.publish (see the NestJS docs).
Events can be applied from other domain objects, but the aggregate usually is the emitter (upon commit).
The other point I wanted to address is that an event-sourced architecture is assumed, i.e. applying CQRS/ES. While CQRS is often used in combination with Event Sourcing there is nothing that prescribes doing so. Event Sourcing can give additional advantages, but also comes with significant added complexity. You should carefully weigh the pros and cons of having ES.
In many cases you do not need Event Sourcing. Having just CQRS already gives you a lot of benefits, such as having your domain / bounded contexts well-contained. Separation between reads and writes, single-responsibility commands + queries (more SOLID in general), cleaner architecture, etc. On a higher level it is easier to shift focus from 'how do I implement this (CRUD-wise)?', to 'how do these user requirements fit in the domain model?'.
Without ES you can have a single relational database and e.g. persist using TypeORM. You can persist events, but it is not needed. In many scenario's you can avoid the eventual consistency where clients need to subscribe to events (maybe you just use them to drive saga's and update read-side views/projections).
You can always start with just CQRS and add Event Sourcing later, when the need arises.
As your architecture evolves, you may find that you require a command bus if you are using Processes/Sagas to manage workflows and inter-aggregate communication. If and when that is the case, it will naturally make sense to use that bus for all commands.
The following is the method I would prefer:
execute the command in controller and then emit an event (PasswordChangedEvent) in command handler. Next, wait till event comes back and return the value in controller.
As for implementation details, in .NET, we use a SignalR websockets service that will read the event bus (where all events are published) and will forward events to clients that have subscribed to them.
In this case, the workflow would be:
The user posts to the controller.
The controller appends the command to the command bus.
The controller returns an ID identifying the command.
The client (browser client) subscribes to events relating to this command.
The command is received by the domain service and handled. An event is emitted to the event store.
The event is published to the event bus.
The event listener subscription service receives the event, finds the subscription, and sends the event to the client.
The client receives the event and notifies the user.

DomainEventPublisher consistency

Having just read Vaughn Vernon's effective aggregate design, I'm wondering about failures related to event publishing.
In the given example at page 9 (page 3 of the PDF), we call DomainEventPublisher.publish(). The event being published allows other aggregates to execute their behaviours.
What I'm wondering is: What happens if DomainEventPublisher.publish() fails ? What happens if DomainEventPublisher.publish() succeeds, but the transaction fails ?
How implementations handle these two cases ?
DomainEventPublisher.publish() is synchronous. You'd setup a generic handler (handles all events) which stores the events in the same database transaction as the business process, which means your event storage must have the ability to be transactionnal with whatever other storage mechanism you rely on to store the state of your aggregates.
Once events have been written on disk transactionnaly, you can then put them on a message queue for asynchronous delivery.
Are there other known ways to do it?
Well, rather than using a static DomainEventPublisher you could record events in a collection on the AR, just like in event sourcing and then implement a centralised mechanism to store them (e.g. transaction hooks, using aspects, etc.).
What happens if DomainEventPublisher.publish() succeeds, but the
transaction fails?
In this case I am against Vernon approach. I prefer to return the events to the application service. This way I can persist the changes performed by the aggregate using a transaction (if needed) and, if everything is Ok, I will publish the event. This also helps to keep the business layer entirely clean and pure.
In a few words; if the transaction fails then no event is raised.
What happens if DomainEventPublisher.publish() fails?
A domain event never fails, by business rules, because it's a notification of things that happened. If an aggregate said Yes to the operation and return a event expressing the business changes; then nothing in the world should say that this operation can not be done or has to be undone.
If the event fails by infrastructure then you need to have the tools to re-raise it (automatically or manually) when the outage is fixed and eventually archive the consistency in your system. Take a look at NServiceBus. It provides retries, error queues, logs and so on to never loose the events.
If the message system is down you have at least event logs that you can use to re-rise them into the message system.

DDD - How handle transactions in the "returning domain events" pattern?

In the DDD litterature, the returning domain event pattern is described as a way to manage domain events. Conceptually, the aggregate root keeps a list of domain events, populated when you do some operations on it.
When the operation on the aggregate root is done, the DB transaction is completed, at the application service layer, and then, the application service iterates on the domain events, calling an Event Dispatcher to handle those messages.
My question is concerning the way we should handle transaction at this moment. Should the Event Dispatcher be responsible of managing a new transaction for each event it process? Or should the application service manages the transaction inside the domain event iteration where it calls the domain Event Dispatcher? When the dispatcher uses infrastructure mecanism like RabbitMQ, the question is irrelevent, but when the domain events are handled in-process, it is.
Sub-question related to my question. What is your opinion about using ORM hooks (i.e.: IPostInsertEventListener, IPostDeleteEventListener, IPostUpdateEventListener of NHibernate) to kick in the Domain Events iteration on the aggregate root instead of manually doing it in the application service? Does it add too much coupling? Is it better because it does not require the same code being written at each use case (the domain event looping on the aggregate and potentially the new transaction creation if it is not inside the dispatcher)?
My question is concerning the way we should handle transaction at this moment. Should the Event Dispatcher be responsible of managing a new transaction for each event it process? Or should the application service manages the transaction inside the domain event iteration where it calls the domain Event Dispatcher?
What you are asking here is really a specialized version of this question: should we ever update more than one aggregate in a single transaction?
You can find a lot of assertions that the answer is "no". For instance, Vaughn Vernon (2014)
A properly designed aggregate is one that can be modified in any way required by the business with its invariants completely consistent within a single transaction. And a properly designed bounded context modifies only one aggregate instance per transaction in all cases.
Greg Young tends to go further, pointing out that adhering to this rule allows you to partition your data by aggregate id. In other words, the aggregate boundaries are an explicit expression of how your data can be organized.
So your best bet is to try to arrange your more complicated orchestrations such that each aggregate is updated in its own transaction.
My question is related to the way we handle the transaction of the event sent after the initial aggregate is altered after the initial transaction is completed. The domain event must be handled, and its process could need to alter another aggregate.
Right, so if we're going to alter another aggregate, then there should (per the advice above) be a new transaction for the change to the aggregate. In other words, it's not the routing of the domain event that determines if we need another transaction -- the choice of event handler determines whether or not we need another transaction.
Just because event handling happens in-process doesn't mean the originating application service has to orchestrate all transactions happening as a consequence of the events.
If we take in-process event handling via the Observable pattern for instance, each Observer will be responsible for creating its own transaction if it needs one.
What is your opinion about using ORM hooks (i.e.:
IPostInsertEventListener, IPostDeleteEventListener,
IPostUpdateEventListener of NHibernate) to kick in the Domain Events
iteration on the aggregate root instead of manually doing it in the
application service?
Wouldn't this have to happen during the original DB transaction, effectively turning everything into immediate consistency (if events are handled in process)?

Resources