DDD: Applying Event Store in a legacy system - domain-driven-design

Our current system is a legacy system which doesn't use domain events. We are going to start publishing domain events.
Other bounded contexts are going to listen to these domain events, but only from the time we start publishing, losing all the past information.
Then, how to deal with this legacy system which didn't record these events, but somehow we want to have a past history before the implementation of this event store system?
Is it a good approach trying to figure out what happened and try to create the domain events (reverse engineering) according to the data we have in our DB?

I wouldn't go down the route of trying to reverse engineer events for a legacy system, unless there is a business reason to do so - is your use case just that you want to fit into the new way you'll be modelling things using events? If there's no business case for it, it sounds like a waste of effort.
How about having a single starting event that represents the current state of each of your 'things' (i.e. Aggregates if you're using DDD concepts) as they exist now in the legacy system? Then add new events on top of this.
I.e.
LegacySystemStateCaptured
NewDomainEvent
AnotherNewDomainEvent
...then when you rebuild your state, apply the LegacySystemStateCaptured event as well as the others.

Related

Migrate legacy database to cqrs/event sourcing view

We got old legacy application with complex business logic which we need to rewrite. We consider to use cqrs and event sourcing. But it's not clear how to migrate data from the old database. Probable we need migrate it to the read database only, as we can't reproduce all the events to populate event store. But we atleast need to create some initial records in event store for each aggregate, like AggregateCreated? Or we need write a scripts and to use all the commands one by one to recreate aggregates in same way we will normally with event sourcing?
Using the existing database, or a transformed version of it, as a start of your read-side persistence is never a good idea. Your event-sourced system needs to have its start, so you get one of the main benefits of event sourcing - being able to create projections on-demand, using polyglot persistence.
Using commands for migration is also not a good idea for a simple reason that commands, by definition, can fail due to pre or post-condition check of invariant control. It also does not convey the meaning of migration, which is to represent the current system state as it is right now. Remember, that the current system stay is not something you can accept or deny. It is given to you and your job is to capture it.
The best practice for such a migration is to emit so-called migration events, like EntityXMigratedFromLegacy. Of course, the work might be substantial. Mainly because the legacy system model will most probably not match the new model, otherwise the reason for such a migration isn't entirely clear.
By using migration events you explicitly state the fact that a piece of state was moved from another place, as-is. You will always know how the migrated entity started its lifecycle in the new system - either by being migrated from legacy or by being initialised in the new system.
Probable we need migrate it to the read database only
No, your read model db can be dropped and recreated any time based on write side, only write side is your source of truth.
But we atleast need to create some initial records in event store for
each aggregate, like AggregateCreated?
Of course, and having ONLY the initial event could be not enough. If your current OrderAggregate has reservations, you must create ItemReservedEvent for-each reservation it has.
Or we need write a scripts and to use all the commands one by one to
recreate aggregates in same way we will normally with event sourcing?
Feels like that's the way you should go. Read old aggregate/entity from db and try to map it to a new one.

CQRS and Event Sourcing Guide

I want to create a CQRS and Event Sourcing architecture that is very cheap and very flexible and very uncomplicated.
I want to make sure that events never fail to at least reach the publisher/event store, ever, ever, because that's where business is.
Now, i have several options in mind:
Azure
With azure, i seem to not know what to use.
Azure service bus
Azure Function
Azure webjob (i suppose this can be replaced with Azure functions)
?? (something else i forgot or dont know?)
How reliable are these azure server-less solutions??
Custom
For this i am thinking of using RabbitMQ, the problem is the cost of a virtual machine to run it.
All in all, i want:
Ability to replay the messages/events in case of failure.
Ability to easily add subscribers.
Ability to select the subscribers upon which to replay the messages.
The Event store should be able to store very large sizes of event messages (or how else shall queue an image or file??).
The event store MUST NEVER EVER get chocked, or sleep.
Speed of implementation/prototyping would be an added
advantage.
What does your experience suggest?
What about other alternatives? (eg: apache-kafka)?
Why not run Event Store? Created by Greg Young himself. Host where you need.
I am a java user, I have been using hornetq (aka artemis which I dont use) an alternative to rabbitmq for the longest; the only problem is it does not support replication but gets the job done when it comes to eventsourcing. For your custom scenario, rabbitmq is a good choice but try running it on a digital ocean instance for low costs. If you are looking for simplicity and flexibility you have only 2 choices , build your own or forgo simplicity and pick up apache kafka with all its complexities but will give you flexibility. Again you can also build an eventstore with mongodb. https://www.mongodb.com/blog/post/event-sourcing-with-mongodb
Your requirements are too vague to make the optimal choice. You need to consider a lot of things, one of them would be, for instance, the numbers of events per one aggregate, the number of aggregates (note that this has to be statistical). Those are important primarily because if you allow tens of thousands of events for each aggregate then you would need to have snapshotting which adds complexity which you might not need.
But for regular use cases you could just use a relational database like Postgres as your (linearizable) event store. It also has a listen/notify functionality to you would not really need any message bus either and your application could be written in a reactive way.

How to handle domain model updates and immutability of stored events?

I understand that events in event sourcing should never be allowed to change. But what about the in-memory state? If the domain model needs to be updated in some way, shouldn't old event still be replayed to old models? I mean shouldn't it be possible to always replay events and get the exact same state as before or is it acceptable if this state evolves too as long as the stored events remains the same? Ideally I think I'd like to be able to get a state as it was with it's old models, rules and what not. But other than that I of course also want to replay old events into new models. What does the theory say about this?
Anticipate event structure changes
You should always try to reflect the fact that an event had a different structure in your event application mechanism (i.e. where you read events and apply them to the model). After all, the earlier structure of an event was a valid structure at that time.
This means that you need to be prepared for this situation. Design the event application mechanism flexible enough so that you can support this case.
Migrating stored events
Only as a very last resort should you migrate the stored events. If you do it, make sure you understand the consequences:
Which other systems consumed the legacy events?
Do we have a problem with them if we change a stored event?
Does the migration work for our system (verify in a QA environment with a full data set)?

Why can't sagas query the read side?

In a CQRS Domain Driven Design system, the FAQ says that a saga should not query the read side (http://cqrs.nu). However, a saga listens to events in order to execute commands, and because it executes commands, it is essentially a "client", so why can't a saga query the read models?
Sagas should not query the read side (projections) for information it needs to fulfill its task. The reason is that you cannot be sure that the read side is up to date. In an eventual consistent system, you do not know when the projection will be updated so you cannot rely on its state.
That does not mean that sagas should not hold state. Sagas do in many cases need to keep track of state, but then the saga should be responsible of creating that state. As I see it, this can be done in two ways.
It can build up its state by reading the events from the event store. When it receives an event that it should trigger on it will read all events it needs from the store and build up its state in a similar manner that an aggregates does. This can be made performant in Event Store by creating new streams.
The other way is that it continuously listens to events from the event store and build up state and stores it on some data storage like projections do. Just be careful with this approach. You cannot reply sagas in the same way as you do with projections. If you need to change the way you store state and want to rebuild it, make sure that you do not execute the commands that you have already executed.
Sagas use the command model to update the state of the system. The command model contains business rules and is able to ensure that changes are valid within a given domain. To do that, the command model has all the information available that it needs.
The read model, on the other hand, has an entirely different purpose: It structures data so that it is suitable to provide information, e.g. to display on a web page.
Since the saga has all the information it needs through the command model, so it doesn't need the read model. Worse, using the read model from a saga would introduce additional coupling and increase the overall complexity of the system considerably.
This does not mean that you absolutely cannot use the read model. But if you do, be sure you understand the consequences. For me, that bar is quite high, and I have always found a different solution yet.
It's primarily about separation of concerns. Process managers (sagas) are state machines responsible for coordinating activities. If the process manager want to affect change, it dispatches commands (asynchronous).
Also: what is the read model? It's a projection of a bunch of events that already happened. So if the processor cared about those events... shouldn't it have been subscribing to them all along? So there's a modeling smell here.
Possible issues:
The process manager should have been listening to earlier messages in the stream, so that it would be in the right state when this message arrived.
The current event should be richer (so that the data the process manager "needs" is already present).
... variation - the command handler should instead be listening for a different event, and THAT one should be richer.
The query that you want should really be a command to an aggregate that already knows the answer
and failing all else
Send a command to a service, which runs the query and dispatches events in response. This sounds weird, but it's already common practice to have a process manager dispatch a message to a scheduling service, to be "woken up" when some fixed amount of time passes.

CQRS Commands and Queries - Do they belong in the domain?

In CQRS, do they Commands and Queries belong in the Domain?
Do the Events also belong in the Domain?
If that is the case are the Command/Query Handlers just implementations in the infrastructure?
Right now I have it layed out like this:
Application.Common
Application.Domain
- Model
- Aggregate
- Commands
- Queries
Application.Infrastructure
- Command/Query Handlers
- ...
Application.WebApi
- Controllers that utilize Commands and Queries
Another question, where do you raise events from? The Command Handler or the Domain Aggregate?
Commands and Events can be of very different concerns. They can be technical concerns, integration concerns, domain concerns...
I assume that if you ask about domain, you're implementing a domain model (maybe even with Domain Driven Design).
If this is the case I'll try to give you a really simplified response, so you can have a starting point:
Command: is a business intention, something you want a system to do. Keep the definition of the commands in the domain. Technically it is just a pure DTO. The name of the command should always be imperative "PlaceOrder", "ApplyDiscount" One command is handled only by one command handler and it can be discarded if not valid (however you should make all the validation possible before sending the command to your domain so it cannot fail)
Event: this is something that has happened in the past. For the business it is the immutable fact that cannot be changed. Keep the definition of the domain event it in the domain. Technicaly it's also a DTO object. However the name of the event should always be in the past "OrderPlaced", "DiscountApplied". Events generally are pub/sub. One publisher many handlers.
If that is the case are the Command/Query Handlers just implementations in the infrastructure?
Command Handlers are semantically similar to the application service layer. Generally application service layer is responsible for orchestrating the domain. It's often build around business use cases like for example "Placing an Order". In those use cases invoke business logic (which should be always encapsulated in the domain) through aggregate roots, querying, etc. It's also a good place to handle cross cutting concerns like transactions, validation, security, etc.
However, application layer is not mandatory. It depends on the functional and technical requirements and the choices of architecture that has been made.
Your layring seems correct. I would better keep command handlers at the boundary of the system. If there is not a proper application layer, a command handler can play a role of the use case orchestrator. If you place it in the Domain, you won't be able to handle cross cutting concerns very easily. It's a tradeoff. You should be aware of the pro and cons of your solution. It may work in one case and not in another.
As for the event handlers. I handle it generally in
Application layer if the event triggers modification of another Aggregate in the same bounded context or if the event trigger some infrastructure service.
Infrastructure layer if the event need to be split to multiple consumers or integrate other bounded context.
Anyway you should not blindly follow the rules. There are always tradeoffs and different approaches can be found.
Another question, where do you raise events from? The Command Handler or the Domain Aggregate?
I'm doing it from the domain aggregate root. Because the domain is responsible for raising events.
As there is always a technical rule, that you should not publish events if there was a problem persisting the changes in the aggregate and vice-versa I took the approach used in Event Sourcing and that is pragmatic. My aggregate root has a collection of Unpublished events. In the implementation of my repository I would inspect the collection of Unpublished events and pass them to the middleware responsible for publishing events. It's easy to control that if there is an exception persisting an aggregate root, events are not published. Some says that it's not the responsibility of the repository, and I agree, but who cares. What's the choice. Having awkward code for event publishing that creeps into your domain with all the infrastructure concerns (transaction, exception handling, etc) or being pragmatic and handle all in the Infrastructure layer? I've done both and believe me, I prefer to be pragmatic.
To sum up, there is no a single way of doing things. Always know your business needs and technical requirements (scalability, performance, etc.). Than make your choices based on that. I've describe what generally I've done in the most of cases and that worked. It's just my opinion.
In some implementations, Commands and handlers are in the Application layer. In others, they belong in the domain. I've often seen the former in OO systems, and the latter more in functional implementations, which is also what I do myself, but YMMV.
If by events you mean Domain Events, well... yes I recommend to define them in the Domain layer and emit them from domain objects. Domain events are an essential part of your ubiquitous language and will even be directly coined by domain experts if you practise Event Storming for instance, so it definitely makes sense to put them there.
What I think you should keep in mind though is that no rule about these technical details deserves to be set in stone. There are countless questions about DDD template projects and layering and code "topology" on SO, but frankly I don't think these issues are decisive in making a robust, performant and maintainable application, especially since they are so context dependent. You most likely won't organize the code for a trading system with millions of aggregate changes per minute in the same way that you would a blog publishing platform used by 50 people, even if both are designed with a DDD approach. Sometimes you have to try things for yourself based on your context and learn along the way.
Command and events are DTOs. You can have command handlers and queries in any layer/component. An event is just a notification that something changed. You can have all type of events: Domain, Application etc.
Events can be generated by both handler and aggregate it's up to you. However, regardless where they are generated the command handler should use a service bus to publish the events. I prefer to generate domain events inside the aggregate root.
From a DDD strategic point of view, there are just business concepts and use cases. Domain events, commands, handlers are technical details. However all domain use cases are usually implemented as a command handler, therefore command handlers should be part of the domain as well as the query handlers implementing queries used by the domain. Queries used by the UI can be part of the UI and so on.
The point of CQRS is to have at least 2 models and the Command should be the domain model itself. However you can have a Query model, specialised for domain usage but it's still a read (simplified) model. Consider the command model as being used only for updates, the read model only for queries. But, you can have multiple read models (to be used by a specific layer or component) or just a generic (used for everything query) one.

Resources