Related
I was asked to implement CQRS/Event sourcing patterns into a legacy web application, in order to prepare to migrate it from a monolithic/state oriented model to a distributed, service oriented app.
I have some questions on how I can design a Domain oriented code bundle that would connect the legacy entities strongly coupled to database, with a new Event sourced model.
The first things I did were:
writing a small "framework" for CQRS/ES, with classes like AggregateRoot, DomainEvent, Command, Handlers, Messaging, Eventstore, AggregateIds, etc.
trying to group and "migrate" the legacy Entities into some Aggregates to reconstruct all the history and states of the app into EventSoourced Aggregates
plug some Commands dispatching in the old controllers in order to let the app work as is, but also to feed the new CQRS/ES system on the side.
The context:
The legacy app contains several entities, mapped to database, that hold the model layer. (Our domain is Human resources (manpower).
Let's say we have those existing entities:
Worker, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
competences 1-N
Society, in which worker works, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
hours
Contract, with various fields and related entities (OneToOne, OneToMany), like
address 1-1
Worker 1-1
Society 1-1
documents 1-N
days 1-N
hours
etc.
From this legacy model, I designed a MissionAggregate that holds:
A db independent ID, like UUID
some Value objects: address, days (they were an entity in the legacy model, they became VOs here)
I also designed a WorkerAggregate and a SocietyAggregate, with fields and UUIDS, and in the MissionAggregate I added:
a reference to WorkerAggregate's UUID
a reference to SocietyAggregate's UUID
As I said earlier, my aim is to leave the legacy app as is, but just introduce in the CRUD controller's methods some calls to dispatch Commands to the new CQRS system.
For example:
After flushing newly created Contract in bdd, I want to dispatch a "CreateMissionCommand" to the new command bus.
It targets the appropriate Command Handler, that handles all the command's data, passes it to a newly created Aggregate with a new UUID and stores "MissionCreatedDomainEvent" in the EventStore.
The DomainEvent is indexed with an AggregateId, a playhead, and has a payload which contains the fields necessary to be applied to and build the MissionAggregate.
The newly Contract created in the app has now its former lifecycle, as usual, with all the updates that the legacy app does on it. But I also need to reflects all those changes to the corresponding EventSourcedAggregate, so every time there is a flush in database in the app, I dispatch a Command that translates the "crud like operations" of the legacy app into a Domain oriented /Command oriented pattern.
To sum up the workflow is:
A Crud legacy operation occurs and flushes some changes on the Contract Entity
In just a row of code in the controller, I dispatch a command built with necessary fields (AggregateId of the MissionAggregate... that I need to have stored somewhere... see next problems) to the Domain command bus, so that the impact on the existing code base is very low.
The bus passes the command to the corresponding command handler
The handler loads the aggregate and applies the changes it by calling the appropriate Aggregate method
then after some validation, the aggregate raises and stores the appropriate event
My problems and questions (some of them at least) are:
I feel like I am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that I have between the Entities, and with the same type of validations, checks etc.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that I have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't I have only references to Worker's entity id and Society's entity id?
How can I avoid having a eternally growing MissionAggregate? The Contract Entity is quite huge, it has a lot of fields that are constantly updated (hours, days, documents, etc.) If I want to store all those events, I need to have a large MissionAggregate to reflect all those changes; and so I need to have a tons of CommandHandlers that react to all the Commands of add, update, etc. that I am going to dispatch from the legacy app.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when I want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation? In the Entity itself, in a AggregateId field? in the Aggregate, should I have a ContractId field? Or should I have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
What to do with the past? Should I migrate all the existing data through a script that generates Aggregates and events on all the historical data?
Thanks in advance for your time.
You have a huge task ahead of you, let's try to break it down.
It's best to build this new part of the system in isolation from the legacy codebase, otherwise you're going to have your hands tied in every turn of the way.
Create a separate layer in your project for these new requirements. We're going to call it "bubble" from now on. This bubble will be like a greenfield project, with its own structure, dependencies, etc. There will be no direct communication between the bubble and the legacy; communication will happen through another dedicated translation layer, which we'll call "Anti-Corruption Layer" (ACL).
ACL
It is like an API between two systems.
It translates calls from the bubble to the legacy and vice-versa. Its purpose is to prevent one system from corrupting or influencing the other. This way you can keep building/maintaining each system independently from each other.
At the same time, the ACL allows one system to consume the other, and reuse logic, validations, rules, etc.
To answer your questions directly:
I feel like i am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that i have between the Entities, and with the same type of validations, checks etc.
With the ACL, you can resort to calling validations and reuse implementations from the legacy code. This will allow you time to rewrite things as needed or as possible.
You may not need to rewrite the entire system, though. If your goal is to implement CQRS and Event Sourcing and you can achieve this goal by keeping most or part of the legacy system, I would say you do it. Unless, of course, one of the goals is to completely replace the old system. Otherwise, keep it; write as less code as possible.
Suggested workflow:
Keep the CQRS and Event Sourcing system in the bubble
Do not bring these new frameworks into legacy
Make the lagacy Controller issue method calls to the ACL
The ACL will convert these calls into Commands and dispatch them
Any events will be caught by your Event Sourcing framework
Results will be persisted to the bubble's database
The bubble's database can be a different schema in the same database or can be a different database altogether. But you'll have to think about synchronization, and that's a topic of its own. To reduce complexity, I recommend a different schema in the same database.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that i have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't i have only references to Worker's entity id and Society's entity id?
How can i avoid having a eternally growing MissionAggregate ? The Contract Entity is quite huge, it has a looot of fields that are constantly updated (hours, days, documents, etc.) If i want to store all those events, i need to have a large MissionAggregate to reflect all those changes; and so i need to have a tons of CommandHandlers that react to all the Commands of add, update, etc that i am going to dispatch from the legacy app.
You should aim for small aggregates. Huge aggregates are likely to degrade performance and cause concurrency problems.
If you anticipate having a huge aggregate, it is best to rethink it and try to break it down. Ask what fields/properties change together - these are possibly a different aggregate.
Also, when you speak about CQRS, you generally lean towards a task-based way of doing things in your system.
Think of a traditional web application, where you have a huge page with lots of fields that are all sent to the server in one batch when the user saves.
Now, contrast it with a modern web app where the user changes small portions of data at each step. If you think about your system this way you'll find those smaller aggregates.
PS. you don't need to rebuild your interfaces for this. If your legacy system has those huge pages, you could have logic in the controllers to detect which fields were changed and issue the appropriate commands.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when i want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation ? In the Entity itself, in a AggregateId field ? in the Aggregate, should i have a ContratId field ? Or should i have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
Aggregates represent a conceptual whole. They are like atoms, indivisible things. You should always refer to an aggregate by its Root Entity Id, and never to a Child Entity Id: looking from the outside, there are no children.
An aggregate should be loaded as a whole and persisted as a whole. One more reason to have small aggregates.
An aggregate can be comprised of a single entity. Or it can have more entities and value objects, forming a graph, but one entity will be elected as the Root and will hold references to its children. Child entities and value objects should not hold references to their parents. The dependency is not bi-directional.
If Contract is an entity inside the Mission aggregate, the Contract should not have a reference to its parent.
But, if your Contract and Mission are different aggregates, then they can reference each other by their Ids.
What to do with the past? Should i migrate all the existing datas through a script that generates Aggregates and events on all the historical data?
That's a question for the business experts. Do they need it? If they don't, then don't implement it just for the sake of doing so. Every decision you make should be geared towards satisfying a business need and generating real value for it, considering the costs and tradeoffs.
Some people say that code is a liability, not an asset, and I aggre to some extent: every line of code you write needs to be tested and supported. Don't write any code that is not really necessary.
Also, have a look at this article about the Strangler Pattern, which shows how to migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.
If you have a chance, watch this course at Pluralsight (paid): Domain-Driven Design: Working with Legacy Projects. The author presents practical approaches for dealing with this kind of task.
I hope this has given you some insight.
I don't want to spoil your game. Everybody knows how cool it is to rewrite something from scratch. It's a challenge, it's fun, it's exciting. However...
migrate it from a monolithic/state oriented model to a distributed, service oriented app
CQRS/Event Sourcing won't solve any of your problems and it won't help you distribute the app in any reasonable way. If you just generate events on the CRUD operations you'll have a large tangled mess of dependencies between each part. Every part that needs data will have to call a couple of "services" (i.e. tables) to get it, than push data elsewhere, generate events1 that some other parts will react to. It will be a mess. Usually this is called a distributed monolith.
This is also the reason you already see problems with it. These problems won't go away, because you are essentially building the same system in the same way, but this time it'll be more complex.
Where to go from here
The very first thing is always: have a clear goal. You want a service oriented architecture you said. Why? Are there parts that need different scaling, different resources? Are they managed by different teams with different life-cycles? Etc.? Maybe you already have all this, I don't know, but if not, that's your first task.
Then. The parts you do want to pull out can't be just CRUD things. Those will not be independent, so whether your goal (see point above!) is scaling or different team, you won't reach your goal! To be independent you'll have to pull out the behavior with the data, and in a way that the service can operate on its own.
You can't just throw buzzwords at it and hope for the best. I'd suggest to just ignore all the hype and buzzwords and think about the goal you want to reach.
For example: I need a million workers to log their time in under 10 minutes total. So that means I need a "service" to enable worker to log their time with a web interface. So let's create that as a complete independent piece with its own database so it can be scaled to a 100 nodes when it needs to be. Export data to billing automatically every hour or so.
I have a system which may generate certain events in the lifecycle of a transaction. On every even I need to update a row in a DB and also send out a UI event over websocket.
One option I have is to implement the event processing (DB and UI) in actors thus avoiding any locking issues - also I can afford minor delays so handling this sequentially will greatly simplify matters.
What are alternative ways of handling this in Scala as I feel maybe Actors might be overkill in this case?
There are those blogs stating that actors should be used for "concurrency with state" - though I would like to see a more appropriate mechanism in order to eliminate this option.
Ultimately the main unique benefit for using actors is that they are great for encapsulating mutable variables so that you avoid race conditions.
To do what you describe, you could just use classic threads. In your (possibly simplified) description, I don't see the potential for deadlocks. If you want to something a bit more composable, e.g. a sequence of asynchronous tasks, you can use Scala's Futures.
Not at all sure if this is applicable for Scala, but there's a great lib for Groovy and Java with several concurrency models. I've myself used the Dataflow Concurrency with great success and can recommend it as a light-weight yet manageable model.
Dataflow Concurrency offers an alternative concurrency model, which is
inherently safe and robust. It puts an emphasis on the data and their
flow though your processes instead of the actual processes that
manipulate the data. Dataflow algorithms relieve developers from
dealing with live-locks, race-conditions and make dead-locks
deterministic and thus 100% reproducible. If you don’t get dead-locks
in tests you won’t get them in production.
There are other models available in the linked GPars library as well.
I wouldn't suggest making the threading yourself, unless you have no other choices.
Addendum
After posting I got interested in the topic and made a few searches. Seems like Akka has direct support for Dataflow model also. Or at least has had in some version.
Actors avoid locking issues because they use queues to interact. You can use threads with (blocking) queues and get the same level of safety. The only advantage of Actors over Threads is that an Actor does not spend memory for call stack, thus we can have many more actors than threads in the same amount of core memory. The downside of actor model is that a complex algorithm which could be implemented in single thread require several actors and so actor implementation may look obscured.
Sorry if it's duplicated, I have searched a lot in SO, but I didn't find a matched question.
I know we shouldn't update multiple aggregate instances in one transaction. However, I think the "multiple aggregate instances" here means multiple instance of different aggregate types. Am I right?
Say, Product and BacklogItem are two different aggregates, so we should avoid updating both Product and BacklogItem in one single transaction.
But what if I need to update multiple instances of the same aggregate type? Say, I need to update all products' name. What's the best way to deal with it?
pseudocode
//Application Service
public void ChangeTitle(string newName)
{
using(uow.BeginTransaction())
{
IEnumerable<Product> products = repo.GetAll();
foreach(var product in products)
{
product.ChangeName(newName);
}
}
}
The type of AR is irrelevant, ARs are always transactional boundaries. When working with many of them you should usually embrace partial failures and eventual consistency.
Why should all renames fail or succeed if there are no invariants to protect across all these ARs. For instance, if the UI did allow users to rename multiple products at a time and some user renames hundreds of them. If one product failed to be renamed because of a concurrency conflict for instance, would it be better not to rename any of the products or inform the user that one of them failed?
You can always violate the rule, but you must understand that the more ARs that are involved in a transaction, the more likely you are to experience transactional failures due to concurrency conflicts (assuming contention can occur).
I usually only modify multiple ARs in one transaction when there are invariants crossing ARs boundaries and I can't afford to avoid dealing with eventual consistency.
The reason for the guidance is related to expectations of concurrent access. Depending on how many users you have accessing your system at one time, the more things that change in a single transaction, the more risk of concurrent changes, which will either:
cause errors or exceptions due to concurrency violations if you use optimistic concurrency management
cause delays due to locking if you use pessimistic concurrency management
cause data consistency issues if you use no concurrency management
This guidance applies whether the size of the things that is changing gets broader (changing more types of aggregates) or deeper (changing more instances of the same aggregate). Rather than considering exactly what is changing, consider, what is the risk that some parts of this change will conflict with a similar change being made by another person at the same time?
Designing DDD for highly concurrent multi-user systems is where the guidance to design small independent aggregates comes from. If that is not your case, then you can potentially violate the guidance based on the your own risk analysis.
Aggregates are used to guard transactional boundaries. From what I see in your pseudocode you are changing them all in one transaction.
Is there any reason that it should be done in one transaction? if so, then it seems like you are missing a concept in your model. This concept could be nothing more then a simple process manager which updates multiple aggregates in your domain and report back when everything finished successfully, it could even incorporate some retry mechanisms or other exception handling in case of a failure in one of the updates. This process manager could be named 'bulk renamer' or something like that?
I would model such a bulk renamer in such a way that it registers itself as a consumer of all domain events from the aggregates that need to be updated, then updates the name of those aggregates (potentially in parallel) and then keeps listening for the domain events. When all events are received it could report back that it finished renaming all aggregates, and perform exception handling.
why always talk about concurrency risk , this kind of operation always exist in erp or other like erp.
there are many operations like batch add resources.
why ddd bound must build on one aggregateroot ,not multiple same type aggregate roots such us one “domain”,which handle multiple same type aggregate roots .
or wheather there is a bound type beyond aggregateroots
I have seen a lot about EventStores but all articles are coupled with talk about CQRS.
We want to use EventStores to integrate bounded contexts, but want to stick with traditional ORM for reading/writing aggregates, to avoid the command/query and separate read-model which in our case would add too much complexity.
Seeing as it is so popular talking about both concepts together one is led to believe they are meant to live together - are there pitfalls of doing an EventStore 'lite' without CQRS, compared to implementing EventStores for aggregates/CQRS/read-model also?
Run to NoSql Distilled - you'll save a lot of time by doing nothing for a few days but reading it and drawing out what you're after. If you are 'reading/writing aggregates' you should be considering databases such as RavenDB that major in that.
Note that the event-store tag is for the JOliver Event Store, and it has as key architectural notions
You also have things slightly backwards in that to get to producing events, your domain gets built in a particular manner to facilitate that. Key things that contrast with the way you posit things in your question (to paraphrase badly and/or unfairly: I want to use event store just to store events - I can do the rest myself)
events are batched by aggregate - its real unit of management of events
dispatching to something.
Go investigate queue management solutions if you don't want an event sourced domain model. This is a very legitimate thing to do - just dont pretend Event Store is a generalised event pub sub queue.
Having the Dispatcher Project to Denormalizers that build a Read Model is the easy bit - you can use all sorts of exotic stuff but using a familiar tool like a SQL SB with a straightforward database layer like PetaPoco will do fine.
Have you actually done a spike with CommonDomain and EventStore ? Have you read the readme doc in the nuget? Have you watched the 2 JOliver videos?
CQRS - Utah Code Camp 2010 - Part 1 AND 2
CQRS - An Introduction for Beginners
Jonathan Oliver on Event Sourcing and EventStore # E-VAN 25 October 2011
We want to use EventStores to integrate bounded contexts
It is possible to use an event store as a message queue with the added benefit that it is persistent and a new subscriber can request all past events.
but want to stick with traditional ORM for reading/writing aggregates,
to avoid the command/query and separate read-model which in our case
would add too much complexity.
As an aside, you can still attain some of the benefits of CQRS by simply using a separate read-model for queries rather than your behavioral model.
Overall, you can use an EventStore without using event sourcing, however you should ensure that it supports all requirements of your integration scenario. It may be that you need other components in addition to an event store. More generally, an event store could be used to store any time series data.
When should the Actor Model be used?
It certainly doesn't guarantee deadlock-free environment.
Actor A can wait for a message from B while B waits for A.
Also, if an actor has to make sure its message was processed before moving on to its next task, it will have to send a message and wait for a "your message was processed" message instead of the straightforward blocking.
What's the power of the model?
Given some concurrency problem, what would you look for to decide whether to use actors or not?
First I would look to define the problem... is the primary motivation a speedup of a nested for loop or recursion? If so a simple task based approach or parallel loop approach will likely work well for you (rather than actors).
However if you have a more complex system that involves dependencies and coordinating shared state, then an actor approach can help. Specifically through use of actors and message passing semantics you can often avoid using explicit locks to protect shared state by actually making copies of that state (messages) and reacting to them.
You can do this quite easily with the classic synchronization problems like dining philosophers and the sleeping barbers problem. But you can also use the 'actor' to help with more modern patterns, i.e. your facade could be an actor, your model view and controller could also be actors that communicate with each other.
Another thing that I've observed is that actor semantics are learnable by most developers and 'safer' than their locked counterparts. This is because they raise the abstraction level and allow you to focus on coordinating access to that data rather than protecting all accesses to the data with locks. As an example, imagine that you have a simple class with a data member. If you choose to place a lock in that class to protect access to that data member then any methods on that class will need to ensure that they are accessing that data member under the lock. This becomes particularly problematic when others (or you) modify the class at a later date, they have to remember to use that lock.
On the other hand if that class becomes an actor and the data member becomes a buffer or port you communicate with via messages, you don't have to remember to take the lock because the semantics are built into the buffer and you will very explicitly know whether you are going to block on that based on the type of the buffer.
-Rick
The usage of Actor is "natural" in at least two cases:
When you can decompose your problem in a set of independent tasks.
When you can decompose your problem in a set of tasks linked by a clear workflow (ie. dataflow programming).
For instance, if you process complex data using a series of filters, it is easy to use a pipeline of actors where each actor receives data from an upstream actor and sets data to a downstream actor.
Of course, this data-flow must not be linear and if a step is slow in your pipeline, instead you can use a pool of actors doing the same job. Another way of solving the load balancing problems would be to use instead a demand-driven approach organized with a kind of virtual Kanban system.
Of course, you will need synchronization between actors in almost all interesting cases, but contrary to the classic multi-thread approach, this synchronization is really "concrete". You can imagine guys in a factory, imagine possible problems (workers run out of the job to do, upstream operations is too fast and intermediate products need a huge storage place, etc.) By analogy, you can then find a solution more easily.
I am not an actor expert but here is my 2 cents when to use actor model:
Actor model is not suited for every concurrent application, for instance if you are creating an application which is multi threaded and works in high concurrency actor model is not made to solve the concurrency issue.
Where actors really comes into play is when you are creating an event driven application. For instance you have an application and you are tracking what are users clicking in your application realtime. You can use actors to do activities realtime segregated by user, device or anything of your business requirement as actors are stateful. So, for example if some users lies in actors which clicked on shirts you can send them notification of some coupon.
Also some applications where actors comes handy are : Finance (Pricing, fraud detection), multiplayer gaming.
Actors are asynchronous and concurrent but does not guarantee message order or time limit as to when the message may be acted upon. Hence atomic transactions cannot be split into Actors.
If the application/task involves no mutable state then Actors are overkill as Actor frameworks go to great lengths to avoid race conditions.