Can domain events be deleted? - domain-driven-design

In order to make the domain event handling consistent, I want to persist the domain events to database while saving the AggregateRoot. later react to them using an event processor, for example let's say I want to send them to an event bus as integration events, I wonder whether or not the event is allowed to be deleted from the database after passing it through the bus?
So the events will never ever be loaded with the AggregateRoot root anymore.

I wonder whether or not the reactor is allowed to remove the event from db after the reaction.
You'll probably want to review Reliable Messaging Without Distributed Transactions, by Udi Dahan; also Pat Helland's paper Life Beyond Distributed Transactions.
In event sourced systems, meaning that the history of domain events is the persisted history of the aggregate, you will almost never delete events.
In a system where the log of domain events is simply a journal of messages to be communicated to other "partners": fundamentally, the domain events are messages that describe information to be copied from one part of the system to another. So when we get an acknowledgement that the message has been copied successfully, we can remove the copy stored "here".
In a system where you can't be sure that all of the consumers have received the domain event (because, perhaps, the list of consumers is not explicit), then you probably can't delete the domain events.
You may be able to move them -- which is, instead of having implicit subscriptions to the aggregate, you could have an explicit subscription from an event history to the aggregate, and then implicit subscriptions to the history.
You might be able to treat the record of domain events as a cache -- if the partner's aren't done consuming the message within 7 days of it being available, then maybe the delivery of the message isn't the biggest problem in the system.
How many nines of delivery guarantee do you need?

Domain events are things that have happened in the past. You can't delete the past, assuming you're not Martin McFly :)
Domain events shouldn't be deleted from event store. If you want to know whether you already processed it before, you can add a flag to know it.
UPDATE ==> DESCRIPTION OF EVENT MANAGEMENT PROCESS
I follow the approach of IDDD (Red Book by Vaughn Vernon, see picture on page 287) this way:
1) The aggregate publish the event locally to the BC (lightweight publisher).
2) In the BC, a lightweight subscriber store all the event published by the BC in an "event store" (which is a table in the same database of the BC).
3) A batch process (worker) reads the event store and publish the events to a message queue (or an event bus as you say).
4) Other BCs interested in the event (or even the same BC) subscribe to the message queue (or event bus) for listening and react to the event.
Anyway, even the worker had sent the event away ok to the message queue, you shouldn't delete the domain event from the event store. Instead simply dont send it again, but events are things that have happened and you cannot (should not) delete a thing that have occurred in the past.
Message queue or event bus are just a mechanism to send/receive events, but the events should remain stored in the BC they were created and published.

Related

ServiceBus message delivery time reliable?

I'm working on creating an events system with Azure ServiceBus, I find events generally hits reliably at the scheduled time I had them set to run - so if event 'pop' is supposed to run at 12:30pm it generally would be delivered at that time to my reciever.
I wanted to know is there a guarantee that events are always fired within the scheduled time or is that more of a suggested time and the system can get clogged and backlogged causing longer queues to form?
There are quite a few differences between messages (which are handled with Service Bus) and events, as you can see in the article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus.
An event is a lightweight notification of a condition or a state change. The publisher of the event has no expectation about how the event is handled. The consumer of the event decides what to do with the notification. Events can be discrete units or part of a series.
[...]
A message is raw data produced by a service to be consumed or stored elsewhere. The message contains the data that triggered the message pipeline.
It sounds like you need a reliable way to have a timer trigger execute on a specific time. Service Bus is not the correct service for that, since "the message enquing time does not mean that the message will be sent at that time. It will get enqueued, but the actual sending time depends on the queue's workload and its state." (see BrokeredMessage.ScheduledEnqueueTimeUtc Property).
For handling the triggering in a reliable way, you could use services like Logic Apps (if you want to create it low-code/no-code) or Azure Functions (for the Serverless solution with code).
If you're actually looking for events, consider Event Grid.

How to handle publishing event when message broker is out?

I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.

DomainEventPublisher consistency

Having just read Vaughn Vernon's effective aggregate design, I'm wondering about failures related to event publishing.
In the given example at page 9 (page 3 of the PDF), we call DomainEventPublisher.publish(). The event being published allows other aggregates to execute their behaviours.
What I'm wondering is: What happens if DomainEventPublisher.publish() fails ? What happens if DomainEventPublisher.publish() succeeds, but the transaction fails ?
How implementations handle these two cases ?
DomainEventPublisher.publish() is synchronous. You'd setup a generic handler (handles all events) which stores the events in the same database transaction as the business process, which means your event storage must have the ability to be transactionnal with whatever other storage mechanism you rely on to store the state of your aggregates.
Once events have been written on disk transactionnaly, you can then put them on a message queue for asynchronous delivery.
Are there other known ways to do it?
Well, rather than using a static DomainEventPublisher you could record events in a collection on the AR, just like in event sourcing and then implement a centralised mechanism to store them (e.g. transaction hooks, using aspects, etc.).
What happens if DomainEventPublisher.publish() succeeds, but the
transaction fails?
In this case I am against Vernon approach. I prefer to return the events to the application service. This way I can persist the changes performed by the aggregate using a transaction (if needed) and, if everything is Ok, I will publish the event. This also helps to keep the business layer entirely clean and pure.
In a few words; if the transaction fails then no event is raised.
What happens if DomainEventPublisher.publish() fails?
A domain event never fails, by business rules, because it's a notification of things that happened. If an aggregate said Yes to the operation and return a event expressing the business changes; then nothing in the world should say that this operation can not be done or has to be undone.
If the event fails by infrastructure then you need to have the tools to re-raise it (automatically or manually) when the outage is fixed and eventually archive the consistency in your system. Take a look at NServiceBus. It provides retries, error queues, logs and so on to never loose the events.
If the message system is down you have at least event logs that you can use to re-rise them into the message system.

How to enforce the order of messages passed to an IoT device over MQTT via a cloud-based system (API design issue)

Suppose I have an IoT device which I'm about to control (lets say switch on/off) and monitor (e.g. collect temperature readings). It seems MQTT could be the right fit. I could publish messages to the device to control it and the device could publish messages to a broker to report temperature readings. So far so good.
The problems start to occur when I try to design the API to control the device.
Lets day the device subscribes to two topics:
/device-id/control/on
/device-id/control/off
Then I publish messages to these topics in some order. But given the fact that messaging is typically an asynchronous process there are no guarantees on the order of messages received by the device.
So in case two messages are published in the following order:
/device-id/control/on
/device-id/control/off
they could be received in the reversed order leaving the device turned on, which can have dramatic consequences, depending on the context.
Of course the API could be designed in some other way, for example there could be just one topic
/device-id/control
and the payload of individual messages would carry the meaning of an individual message (on/off). So in case messages are published to this topic in a given order they are expected to be received in the exact same order on the device.
But what if the order of publishes to individual topics cannot be guaranteed? Suppose the following architecture of a system for IoT devices:
/ control service \
application -> broker -> control service -> broker -> IoT device
\ control service /
The components of the system are:
an application which effectively controls the device by publishing messages to a broker
a typical message broker
a control service with some business logic
The important part is that as in most modern distributed systems the control service is a distributed, multi instance entity capable of processing multiple control messages from the application at a time. Therefore the order of messages published by the application can end up totally mixed when delivered to the IoT device.
Now given the fact that most MQTT brokers only implement QoS0 and QoS1 but no QoS2 it gets even more interesting as such control messages could potentially be delivered multiple times (assuming QoS1 - see https://stackoverflow.com/a/30959058/1776942).
My point is that separate topics for control messages is a bad idea. The same goes for a single topic. In both cases there are no message delivery order guarantees.
The only solution to this particular issue that comes to my mind is message versioning so that old (out-dated) messages could simply be skipped when delivered after another message with more recent version property.
Am I missing something?
Is message versioning the only solution to this problem?
Am I missing something?
Most definitely. The example you brought up is a generic control system, being attached to some message-oriented scheme. There are a number of patterns that can be used when referring to a message-based architecture. This article by Microsoft categorizes message patterns into two primary classes:
Commands and
Events
The most generic pattern of command behavior is to issue a command, then measure the state of the system to verify the command was carried out. If you forget to verify, your system has an open loop. Such open loops are (unfortunately) common in IT systems (because it's easy to forget), and often result in bugs and other bad behaviors such as the one described above. So, the proper way to handle a command is:
Issue the command
Inquire as to the state of the system
Evaluate next action
Events, on the other hand, are simply fired off. As the publisher of an event, it is not my business to worry about who receives the event, in what order, etc. Now, it should also be pointed out that the use of any decent message broker (e.g. RabbitMQ) generally carries strong guarantees that messages will be delivered in the order which they were originally published. Note that this does not mean they will be processed in order.
So, if you treat a command as an event, your system is guaranteed to act up sooner or later.
Is message versioning the only solution to this problem?
Message versioning typically refers to a property of the message class itself, rather than a particular instance of the class. It is often used when multiple versions of a message-based API exist and must be backwards-compatible with one another.
What you are instead referring to is unique message identifiers. Guids are particularly handy for making sure that each message gets its own unique id. However, I would argue that de-duplication in message-based architectures is an anti-pattern. One of the consequences of using messaging is that duplicates are possible, so you should try to design your system behaviors to be stateless and idempotent. If this is not possible, it should be considered that messaging may not be the correct communication solution for the need.
Using the command-event dichotomy as an example, you could perform the following transaction:
The controller issues the command, assigning a unique identifier to the command.
The control system receives the command and turns on.
The control system publishes the "light on" event notification, containing the unique id of the command that was used to turn on the light.
The controller receives the notification and correlates it to the original command.
In the event that the controller doesn't receive notification after some timeout, the controller can retry the command. Note that "light on" is an idempotent command, in that multiple calls to it will have the same effect.
When state changes, send the new state immediately and after that periodically every x seconds. With this solution your systems gets into desired state, after some time, even when it temporarily disconnects from the network (low battery).
BTW: You did not miss anything.
Apart from the comment that most brokers don't support QOS2 (I suspect you mean that a number of broker as a service offerings don't support QOS2, such as Amazon's AWS IoT service) you have covered most of the major points.
If message order really is that important then you will have to include some form of ordering marker in the message payload, be this a counter or timestamp.

CQRS and DDD boundaries

I've have a couple of questions to which I am not finding any exact answer. I've used CQRS before, but probably I was not using it properly.
Say that there are 5 services in the domain: Gateway, Sales, Payments, Credit and Warehouse, and that during the process of a user registering with the application, the front-end submits a few commands, the same front-end will then, once the user is registered, send a few other commands to create an order and apply for a credit.
Now, what I usually do is create a gateway, which receives all pubic commands, which are then validated, and if valid, are transformed into domain commands. I only use events to store data and if one service needs some action to be performed in other service, a domain command is sent directly from one service to the other. But I've seen in other systems that event handlers are used for more than store data. So my question is, what are the limits to what event handlers can do? And is it correct to send commands between services when a specific service requires that some other service performs an action or is it more correct to have the initial event raise and event and let the handler in the other service perform that action in the event handler. I am asking this because I've seen events like: INeedCreditAproved, when I was hoping to see a domain command like: ApprovedCredit.
Any input is welcome.
You're missing an important concept here - Sagas (Process Managers). You have a long-running workflow and it's better expressed centrally.
Sagas listen to events and emit commands. So OrderAccepted event will start a Saga, which then emit ApproveCredit and ReserveStock commands, to be sent to Credit and Warehouse services respectively. Saga can then listen to command success/failure events and compensate approprietely, like say emiting SendEmail command or whatever else.
One year ago I was sending commands like "send commands between services by event handlers when a specific service requires that some other service performs an action" but a stupid decision made by me switched to using events like you said "to have the initial event raise and event and let the handler in the other service perform that action in the event handler" and it worked at first. The most stupid decision I could make. Now I am switching back to sending commands from event handlers.
You can see that other people like Rinat do similar things with event ports/receptors and it is working for them, I think:
http://abdullin.com/journal/2012/7/22/bounded-context-is-a-team-working-together.html
http://abdullin.com/journal/2012/3/31/anatomy-of-distributed-system-a-la-lokad.html
Good luck

Resources