How to add new service/bounded context to already running system? - domain-driven-design

Lets say I have two bounded contexts Project and Budgets. When project adds new member it is publishing MemberAdded event. Budgets listen to this events and it creates local version of Member entity - it just holds what it needs. This will work fine if both services are deployed at the same time. Lets say I use kafka. These MemberAdded events will be stored for example for next 7 days and then kafka deletes them. What if later I need to add new service for example Orders that also need replicated version of Member entity - Customer. I would need to start listening for MemberAdded events but I would still not have old data. How you can handle this case? The only solution I can think of is to use persistent event store that I can query for all events and from this I could restore all data required for new service.

You have two options:
Use a persistent event store and store all events ever generated. This can be thought of a pull system, where new services (or even old services) can fetch data as and when they need it.
Regenerate events from the source system on the fly (Project) and target them to specific services (using a particular pub-sub or message channel, for example). This can be considered a push system, where you must trigger republishing of old events whenever another part of the system needs to access data from another service.
Needless to say, the former is much more efficient and is a one-time effort to gather all events at a central location than the latter, which needs you to do some work whenever a new service (or old service) needs to access/process data from another service.
Extending the thought process, the central event store can also double up as a source of truth and help in audits, reporting, and debugging if made immutable (like in the event sourcing pattern).

Related

How to reliably store event to Azure CosmosDB and dispatch to Event Grid exactly once

I'm experimenting with event sourcing / cqrs pattern using serverless architecture in Azure.
I've chosen Cosmos DB document database for Event Store and Azure Event Grid for dispachting events to denormalizers.
How do I achieve that events are reliably delivered to Event Grid exactly once, when the event is stored in Cosmos DB? I mean, if delivery to Event Grid fails, it shouldn't be stored in the Event Store, should it?
Look into Cosmos Db Change Feed. Built in event raiser/queue for each change in db. You can register one or many listeners/handlers. I.e. Azure functions.
This might be exactly what you are asking for.
Some suggest you can go directly to cosmos db and attach eventgrid at the backside of changefeed.
You cannot but you shouldn't do it anyway. Maybe there are some very complicated methods using distributed transactions but they are not scalable. You cannot atomically store and publish events because you are writing to two different persistences, with different transactional boundaries. You can have a synchronous CQRS monolith, but only if you are using the same technology for the events persistence and readmodels persistence.
In CQRS the application is split in Write/Command and Read/Query sides (this long video may help). You are trying to unify the two parts into a single one, a downgrade if you will. Instead you should treat them separately, with different models (see Domain driven design).
The Write side should not depend on the outcome from the Read side. This means, that after the Event store persist the events, the Write side is done. Also, the Write side should contain all the data it needs to do its job, the emitting of events based on the business rules.
If you have different technologies in the Write and Read part then your Read side should be decoupled from the Write side, that is, it should run in a separate thread/process.
One way to do this is to have a thread/process that listens to appends to the Event store, fetch new events then publish them to the Event Grid. If this process fails or is restarted, it should resume from where it left off. I don't know if CosmosDB supports this but MongoDB (also a document database) has the rslog that you can tail to get the new events, in a few milliseconds.

Can domain events be deleted?

In order to make the domain event handling consistent, I want to persist the domain events to database while saving the AggregateRoot. later react to them using an event processor, for example let's say I want to send them to an event bus as integration events, I wonder whether or not the event is allowed to be deleted from the database after passing it through the bus?
So the events will never ever be loaded with the AggregateRoot root anymore.
I wonder whether or not the reactor is allowed to remove the event from db after the reaction.
You'll probably want to review Reliable Messaging Without Distributed Transactions, by Udi Dahan; also Pat Helland's paper Life Beyond Distributed Transactions.
In event sourced systems, meaning that the history of domain events is the persisted history of the aggregate, you will almost never delete events.
In a system where the log of domain events is simply a journal of messages to be communicated to other "partners": fundamentally, the domain events are messages that describe information to be copied from one part of the system to another. So when we get an acknowledgement that the message has been copied successfully, we can remove the copy stored "here".
In a system where you can't be sure that all of the consumers have received the domain event (because, perhaps, the list of consumers is not explicit), then you probably can't delete the domain events.
You may be able to move them -- which is, instead of having implicit subscriptions to the aggregate, you could have an explicit subscription from an event history to the aggregate, and then implicit subscriptions to the history.
You might be able to treat the record of domain events as a cache -- if the partner's aren't done consuming the message within 7 days of it being available, then maybe the delivery of the message isn't the biggest problem in the system.
How many nines of delivery guarantee do you need?
Domain events are things that have happened in the past. You can't delete the past, assuming you're not Martin McFly :)
Domain events shouldn't be deleted from event store. If you want to know whether you already processed it before, you can add a flag to know it.
UPDATE ==> DESCRIPTION OF EVENT MANAGEMENT PROCESS
I follow the approach of IDDD (Red Book by Vaughn Vernon, see picture on page 287) this way:
1) The aggregate publish the event locally to the BC (lightweight publisher).
2) In the BC, a lightweight subscriber store all the event published by the BC in an "event store" (which is a table in the same database of the BC).
3) A batch process (worker) reads the event store and publish the events to a message queue (or an event bus as you say).
4) Other BCs interested in the event (or even the same BC) subscribe to the message queue (or event bus) for listening and react to the event.
Anyway, even the worker had sent the event away ok to the message queue, you shouldn't delete the domain event from the event store. Instead simply dont send it again, but events are things that have happened and you cannot (should not) delete a thing that have occurred in the past.
Message queue or event bus are just a mechanism to send/receive events, but the events should remain stored in the BC they were created and published.

How to control idempotency of messages in an event-driven architecture?

I'm working on a project where DynamoDB is being used as database and every use case of the application is triggered by a message published after an item has been created/updated in DB. Currently the code follows this approach:
repository.save(entity);
messagePublisher.publish(event);
Udi Dahan has a video called Reliable Messaging Without Distributed Transactions where he talks about a solution to situations where a system can fail right after saving to DB but before publishing the message as messages are not part of a transaction. But in his solution I think he assumes using a SQL database as the process involves saving, as part of the transaction, the correlationId of the message being processed, the entity modification and the messages that are to be published. Using a NoSQL DB I cannot think of a clean way to store the information about the messages.
A solution would be using DynamoDB streams and subscribe to the events published either using a Lambda or another service to transformed them into domain-specific events. My problem with this is that I wouldn't be able to send the messages from the domain logic, the logic would be spread across the service processing the message and the Lambda/service reacting over changes and the solution would be platform-specific.
Is there any other way to handle this?
I can't say a specific solution based on DynamoDB since I've not used this engine ever. But I've built an event driven system on top of MongoDB so I can share my learnings you might find useful for your case.
You can have different approaches:
1) Based on an event sourcing approach you can just save the events/messages your use case produce within a transaction. In Mongo when you are just inserting/appending new items to the same collection you can ensure atomicity. Anyway, if the engine does not provide that capability the query operation is so centralized that you are reducing the possibility of an error at minimum.
Once all the events are stored, you can then consume them and project them to a given state and then persist the updated state in another transaction.
Here you have to deal with eventual consistency as data will be stale in your read model until you have projected the events.
2) Another approach is applying the UnitOfWork pattern where you cache all the query operations (insert/update/delete) to save both events and the state. Once your use case finishes, you execute all the cached queries against the database (flush). This way although the operations are not atomic you are again centralizing them quite enough to minimize errors.
Of course the best is to use an ACID database if you require that capability and any other approach will be a workaround to get close to it.
About publishing the events I don't know if you mean they are published to a messaging transportation mechanism such as rabbitmq, Kafka, etc. But that must be a background process where you fetch the events from the DB and publishes them in order to break the 2 phase commit within the same transaction.

How to handle projection errors by event sourcing and CQRS?

I want to use event sourcing and CQRS, and so I need projections (I hope I use the proper term) to update my query databases. How can I handle database errors?
For example one of my query cache databases is not available, but I already updated the others. So the not-available database won't be in snyc with the others when it comes back to business. How will it know that it have to run for instance the last 10 domain events from the event storage? I guess I have to store information about the current state of the databases, but what if that database state storage fails? Any ideas, best practices how to solve this kind of problems?
In either case, you must tell your messaging bus that the processing failed and it should redeliver the event later, in the hope that the database will be back online then. This is essentially why we are using message bus systems with an "at least once"-delivery guarantee.
For transactional query databases, you should also rollback the transaction, of course. If your query database(s) do not support transactions, you must make sure on the application side that updates are idempotent - i.e., if your event arrives on the next delivery attempt, your projection code and/or database must be designed such that the repeated processing of the event does not harm the state of the database. This is sometimes trivial to achieve (e.g., when the event leads to a changed person's name in the projection), but often not-so-trivial (e.g., when the projection simply increments view counts). But this is what you pay for when you are using non-transactional databases.

Custom Logging mechanism: Master Operation with n-Operation Details or Child operations

I'm trying to implement logging mechanism in a Service-Workflow-hybrid application. The requirements for logging is that instead for independent log action, each log must be considered as a detail operation and placed against a parent/master operation. So, it's a parent-child and goes to database table(s). This is the primary reason, NLog failed.
To help understand better, I'm diving in a generic detail. This is how the application flow goes:
Now, the Main entry point of the application (normally called Program.cs) is Platform. It initializes an engine that is capable of listening incoming calls from ISDN lines, VoIP, or web services. The interface is generic, so any call that reaches the Platform triggers OnConnecting(). OnConnecting() is a thread-safe event and can be triggered as many times as system requires.
Within OnConnecting(), a new instance of our custom Workflow manager is launched and the context is a custom object called ProcessingInfo:
new WorkflowManager<ZeProcessingInfo>();
Where, ZeProcessingInfo:
var ZeProcessingInfo = new ProcessingInfo(this, new LogMaster());
As you can see, the ProcessingInfo is composed of Platform itself and a new instance of LogMaster. LogMaster is defined in an independent assembly.
Now this LogMaster is available throughout the WorkflowManager, all the Workflows it launches, all the activities within any running Workflow, and passed on to external code called from within any Activity. Now, when a new LogMaster is initialized, a Master Operation entry is created in the database and this LogMaster object now lives until this call is ended after a series of very serious roller coaster rides through different workflows. Upon every call of OnConnecting(), a new Master Operation is created and maintained.
The LogMaster allows for calling a AddDetail() method that adds new child detail under the internally stored Master Operation (distinguished through a Guid Primary Key). The LogMaster is built upon Entity Framework.
And, I'm able to log under the same Master Operation as many times as I require. But the application requirements are changing and there is a need to log from other assemblies now. There is a Platform Server assembly witch is a Windows Service that acts as a server listening to web service based calls and once a client calls a method, OnConnecting in Platform is triggered.
I need a mechanism to somehow retrieve the related LogMaster object so that I can add detail to the same Master Operation. But Platform Server is the once triggering the OnConnecting() on the Platform and thus, instantiating LogMaster. This creates a redundancy loop.
Also, failure scenarios are being considered as well. If LogMaster fails, need to revert to Event Logging from Database Logging. If Event Logging is failed (or not allowed through unified configuration), need to revert to file-based (XML) logging.
I hope I have given a rough idea. I don't expect code but I need some strategy for a very seamless plug-able configurable logging mechanism that supports Master-Child operations.
Thanks for reading. Any help would be much appreciated.
I've read this question a number of times and it was pretty hard to figure out what was going on. I don't think your diagram helps at all. If your question is about trying to retrieve the master log record when writing child log records then I would forget about trying to create normalised data in the log tables. You will just slow down the transactional system in trying to do so. You want the log/audit records to write as fast as possible and you can later aggregate them when you want to read them.
Create a de-normalised table for the logs entries and use a single Guid in that table to track the session/parent log master. Yes this will be a big table but it will write fast.
As for guaranteed delivery of log messages to a destination, I would try not to create multiple destinations as combining them later will be a nightmare but rather use something like MSMQ to emit the audit logs as fast as possible and have another service pick them up and process them in a guaranteed delivery manner. ETW (Event Logging) is not guaranteed under load and you will not know that it has failed.

Resources