CQRS & DDD - Domain Model Business Rules validation using cqrs read model

CQRS & DDD - Domain Model Business Rules validation using cqrs read model - domain-driven-design

I'm novice in DDD and CQRS patters and I want to have your opinion how a domain entity can be validated.
I'm going to use the common example Order->OrderLine, where Order is the AR.
The validation of business rules in an Aggregate is through the AR for consistency matters.
How can I validate a business rule that need data outside the Aggregate of Order?
I'm using also CQRS approach and I think that using the ReadModel to get the data that I need to make the validation of my business rules is not a bad option...What do you think?

With my experience of CQRS I associate the ReadModel as being eventually consistent, and therefore I wouldn't be 100% confident in the ReadModel representing the current state of the system. This becomes more the case when you want to distribute and replicate your ReadModels.
I would only want to use the ReadModel to limit the number of invalid commands being sent to your application.
It sounds to me that you want to start thinking about Domain Services, which can be used to encapsulate domain logic that falls outside the boundary of a single aggregate/entity/value object.
As David points out here Implement Domain Services as Extension Methods for the repository, Jimmy Bogard has a definition http://lostechies.com/jimmybogard/2008/08/21/services-in-domain-driven-design/

Yes, use the read model for command validation. I call this "command context" - the current state of the world, based on which command may be valid or invalid. In CQRS, this current state of the world is represented in your read model. User is making decisions based on it, what commands should be issued.
You may also consider various ways to guide user decisions, so that he doesn't issue invalid commands (warn in advance if username in not unique, etc).

Related

Can I say Axon Commands and Events are considered as anemic models?

My question here is quite straight as mentioned in the subject.
However, please allow me to give some brief explanation here about my innocent thoughts.
I've been using Axon for approximately 10 months now. I used to design my project structure based on the Hexagonal architecture with two top level packages respectively for domain and infrastructure.
Furthermore, domain package will contain different domain objects (as explained in the DDD concept) such as follow:
Aggregate (this will be an Axon aggregate class).
Repository (in my case, this will be a Spring Data Repository interface).
Entity (in my case, this contains any lookup entity that i used for set-based consistency validation as written here).
Service Port (collection of Input and Ouput port interfaces).
Commands (representing Axon Command object).
As for Events, I used to put them on a different module that I compiled as a jar file, so I can share it to other developers whom going to use the same event in their project.
I've noticed recently that all of my commands and events were basically anemic models (an anti pattern that we should avoid).
Is there any good practice on this ? Or, Is it something that intentionally used by design ?
I've been thinking to put my Command classes within my Aggregate class (as an inner classes). At least by using this approach I won't end-up with having so many anemic models scattered outside. Any thoughts ?

Commands are designed to be behavior and input structures mirroring the external world. They don't necessarily mirror an aggregate's structure.
They are not even connected clearly to one single aggregate, at times. Enclosing them within aggregates can be a code smell because you are then thinking in terms of resources and UI organization, instead of transaction boundaries and entity groups.
You are also violating the open-closed principle. Changes in volatile layers like user interface and request structures will make you edit the Aggregate class, and that is not good design.
On a more general note...
At times, this debate of anemic vs. non-anemic (or dry vs. non-dry) can push you in the direction of premature - and incorrect - optimization. Try avoiding this trap because you will end up optimising at the code level, but your domain will suffer.
DDD and CQRS guidelines align with principles that help you keep complexity at bay over the long term. Things kept distinct and separate help you achieve this.

First of all, in DDD, your domain had to be free of any frameworks, just use pure language library.
Then, mixing Commands and Aggregates cannot be a good solution. I think Commands belongs to Port while Aggregates belongs to the Hexagone.
Finally, DDD highlights the discovery of the domain thanks to the experts. Did you do that ? If not, if you're only using the Tacticts pattern, you'll miss one of the most important part of DDD.

How to handle hard aggregate-wide constraints in DDD/CQRS?

I'm new to DDD and I'm trying to model and implement a simple CRM system based on DDD, CQRS and event sourcing to get a feel for the paradigm. I have, however, run in to some difficulties that I'm not sure how to handle. I'm not sure if my difficulties stem from me not having modeled the domain properly or that I'm missing something else.
For a basic illustration of my problems, consider that my CRM system has the aggregate CustomerAggregate (which seems reasonble to me). The purpose of this aggregate is to make sure each customer is consistent and that its invarints hold up (name is required, social security number must be on the correkct format, etc.). So far, all is well.
When the system receives a command to create a new customer, however, it needs to make sure that the social security number of the new customer doesn't already exist (i.e. it must be unique across the system). This is, of cource, not an invariant that can be enforced by the CustomerAggregate aggregate since customers don't have any information regarding other customers.
One suggestion I've seen is to handle this kind of constraint in its own aggregate, e.g. SocialSecurityNumberUniqueAggregate. If the social security number is not already registered in the system, the SocialSecurityNumberUniqueAggregate publishes an event (e.g. SocialSecurityNumberOfNewCustomerWasUniqueEvent) which the CustomerAggregate subscribes to and publishes its own event in response to this (e.g. CustomerCreatedEvent). Does this make sense? How would the CustomerAggregate respond to, for example, a missing name or another hard constraint when responding to the SocialSecurityNumberOfNewCustomerWasUniqueEvent?

The search term you are looking for is set-validation.
Relational databases are really good at domain agnostic set validation, if you can fit the entire set into a single database.
But, that comes with a cost; designing your model that way restricts your options on what sorts of data storage you can use as your book of record, and it splits your "domain logic" into two different pieces.
Another common choice is to ignore the conflicts when you are running your domain logic (after all, what is the business value of this constraint?) but to instead monitor the persisted data looking for potential conflicts and escalate to a human being if there seems to be a problem.
You can combine the two (ex: check for possible duplicates via query when running the domain logic, and monitor the results later to mitigate against data races).
But if you need to maintain an invariant over a set, and you need that to be part of your write model (rather than separated out into your persistence layer), then you need to lock the entire set when making changes.
That could mean having a "registry of SSN assignments" that is an aggregate unto itself, and you have to start thinking about how much other customer data needs to be part of this aggregate, vs how much lives in a different aggregate accessible via a common identifier, with all of the possible complications that arise when your data set is controlled via different locks.
There's no rule that says all of the customer data needs to belong to a single "aggregate"; see Mauro Servienti's talk All Our Aggregates are Wrong. Trade offs abound.
One thing you want to be very cautious about in your modeling, is the risk of confusing data entry validation with domain logic. Unless you are writing domain models for the Social Security Administration, SSN assignments are not under your control. What your model has is a cached copy, and in this case potentially a corrupted copy.
Consider, for example, a data set that claims:
000-00-0000 is assigned to Alice
000-00-0000 is assigned to Bob
Clearly there's a conflict: both of those claims can't be true if the social security administration is maintaining unique assignments. But all else being equal, you can't tell which of these claims is correct. In particular, the suggestion that "the claim you happened to write down first must be the correct one" doesn't have a lot of logical support.
In cases like these, it often makes sense to hold off on an automated judgment, and instead kick the problem to a human being to deal with.
Although they are mechanically similar in a lot of ways, there are important differences between "the set of our identifier assignments should have no conflicts" and "the set of known third party identifier assignments should have no conflicts".

Do you also need to verify that the social security number (SSN) is really valid? Or are you just interested in verifying that no other customer aggregate with the same SSN can be created in your CRM system?
If the latter is the case I would suggest to have some CustomerService domain service which performs the whole SSN check by looking up the database (e.g. via a repository) and then creates the new customer aggregate (which again checks it's own invariants as you already mentioned). This whole process - the lookup of existing SSN and customer creation - needs to happen within one transaction to to ensure consistency. As I consider this domain logic a domain service is the perfect place for it. It does not hold data by itself but orchestrates the workflow which relates to business requirements - that no to customers with the same SSN must be created in our CRM.
If you also need to verify that the social security number is real you would also need to perform some call the another service I guess or keep some cached data of SSNs in your CRM. In this case you could additonally have some SocialSecurityNumberService domain service which is injected into the CustomerService. This would just be an interface in the domain layer but the implementation of this SocialSecurityNumberService interface would then reside in the infrastructure layer where the access to whatever resource required is implemented (be it a local cache you build in the background or some API call to another service).
Either way all your logic of creating the new customer would be in one place, the CustomerService domain service. Additional checks that go beyond the Customer aggregate boundaries would also be placed in this CustomerService.
Update
To also adhere to the nature of eventual consistency:
I guess as you go with event sourcing you and your business already accepted the eventual consistency nature. This also means entries with the same SSN could happen. I think you could have some background job which continually checks for duplicate entries and depending on the complexity of your business logic you might either be able to automatically correct the duplicates or you need human intervention to do it. It really depends how often this could really happen.
If a hard constraint is that this must NEVER happen maybe event sourcing is not the right way, at least for this part of your system...
Note: I also assume that command de-duplication is not the issue here but that you really have to deal with potentially different commands using the same SSN.

Read model for aggregate in DDD CQRS ES

In CQRS + ES and DDD, is it a good thing to have small read model in aggregate to get data from other aggregate or bounded context?
For example, in order validation (In Order aggregate), there is a business rules which validate order only if customer is not flagged. The flag information is put in read model (specific to the aggregate) via synchronous domain events.
What do you think about this ?

is it a good thing to have small read model in aggregate to get data from other aggregate or bounded context?
It's not ideal. Aggregates, due to their nature, are not good at enforcing consistency that involves state outside of themselves.
What this usually means is that the business is going to need some way to respond when two aggregates produce an unacceptable state.
You also have the option of checking for the flag before you run the placeOrder command on the aggregate. That check for the flag could be done in the command handler, or in the client -- basically, you have was of "validating" that the command should succeed before passing it to the aggregate.
That said, if it were critical to try to consult the read model while processing the command, a way to do it would be to use a "domain service"; you pass a service provider to the aggregate as part of the command, and let the interface abstract away the fact that running the query requires looking outside of the aggregate.
That gives you some of the decoupling you need to keep the aggregate testable.

It's doable, but not in the form of a read model, rather a Value Object in the Aggregate (since we're on the Write side).
If you already have a CustomerId in Order, you just have to compose a VO with it and a Flagged member.
Of course, this remains prone to all the problems of cross-aggregate communication since the data originates from Customer. Order has to be kept in sync with the flagged status of its Customer, which can require quite a bit of work.
In any case, you should probably first determine with your domain expert whether immediate consistency is an absolute requirement (in which case you have to somehow wrap Customer + Order in a transaction) or if you can afford a small delay in Flagged freshness when enforcing that invariant.
If the latter, you can choose between duplicating Flagged in the Order aggregate or the first option given by #VoiceOfUnreason - the main difference being probably that if the data is in the aggregate, you'll get it for free at the Domain level should you need it in multiple occasions, instead of duplicating the check in multiple use cases/command handlers at the application level.

Business Process to "Transfer" a one-to-many association

Introduction To Domain
I have a Salesman. A Salesman gets BusinessOpportunity's. Both make sense in my domain to be ARs.
There are two ways to model this:
A Salesman aggregate is unaware of its business opportunities, or
A Salesman is aware of his list of opportunities (using an OpportunityId of course)
A BusinessOpportunity, I believe, always needs to know its SalesmanId.
The Question
I have a business process that I plan on implementing using a Process Manager pattern. It is a "TransferAllBusinessOpportunities" process. It means taking 1 salesman and "transferring" all of his/her opportunities to the other.
How should we do this? and how should we model the domain?
I can think of a process state machine if we model this as a bidirectional association, but its quite involved. I don't know how to do it if we only have a unidirectional association because we'd then need to resort to the read model to get the list of business opportunities to transfer and I'm worried that we should keep everything in the write-side model. What do you think about that?
Any help is very much appreciated. Attached a diagram below to help visualize if that helps.
A quick roundup of the questions:
How would you tackle this problem?
How would you model the domain to best tackle this?
Is it ok to use the read model in a command handler to execute the business process?
Thanks again.

Meta-answer: you need to read what Greg Young has to say about set validation. You'll be in a better position to explore your requirements with your domain experts.
I don't know how to do it if we only have a unidirectional association because we'd then need to resort to the read model to get the list of business opportunities
Extracting the data from the read model should be your first resort. What's the problem?
Basic outline
Query the read model for the set
Create command(s) to update the write model based on the set
Dispatch the commands to the write model
the write model gets the set data it needs from the command (not from the read model)
The first resort won't always satisfy your requirements, but it's a good starting point for thinking about the use case. What problems could occur if you implemented this simple way? what would those problems cost the business?
Also: I said commend up above, but it might not be. One thing that you didn't describe is what part of the model "decides" the transfer. Is the model allowed to reject the command to transfer the opportunity? Under what circumstances? which aggregate holds the state that determines if the transfer is allowed?
It might be that the transfer isn't being described as a command, so much as it is by an event, describing a decision made by some human sales manager.
I'm worried that we should keep everything in the write-side model
Maybe. Is there a business invariant that needs the state of the set? So far, it doesn't sound like it, which strongly implies that the set does not belong in the write model. You want to strip down your aggregates as far as you can without losing the ability to enforce the invariant.
Is it ok to use the read model in a command handler to execute the business process?
Is it "ok"? Judging from what I have read in various places, a number of people think so. Personally, I'm not convinced. Roughly, you are looking at two broad outlines
Create a thin command
Send the command to the command handler
Query the read model to flesh out the missing details
Process the fleshed out command
vs
Query the read model
Use the query results to construct a fat command
Send the command to the command handler
Process the command
I've yet to see an example where the business would care about the distinctions between these two implementations; the latter implementation is easier to predict (you don't need to know anything about the state of the read model, just the state of the aggregate and the state of the command).

What are consequences of using repository inside of aggregate vs inside of domain service

We all heard that injecting repository into aggregate is a bad idea, but almost no one tells why.
I will try to write here all disadvantages of doing this, so we can measure rightness of this statement.
First thing that comes into my head is Single Responsibility Principle.
It's true that by injecting repository into AR we are violating SRP, because retrieving and persisting of aggregate is not responsibility of aggregate itself. But it says only about "aggregate itself", not about other aggregates. So does it apply for retrieving from repository aggregates referenced by id? And what about storing them?
I used to think that aggregate shouldn't even know that there is some sort of persistence in system, because it doesn't have to exist. Aggregates can be created just for one procedure call and then get rid of.
Now when I think of it, it's not right, because aggregate root is an entity, and entity has sense only if it has some unique identity. So why would we need unique identity if not for persisting? Even if it's just a persistence in a memory. Maybe for comparing, but in my opinion it's not a main reason behind the identity.
Ok, let's assume that we retrieve and store OTHER aggregates from inside of our aggregate using injected repositories. What are other consequences beside SRP violation?
For sure there is a problem with having no control over persisting of aggregates and retrieving is some kind of lazy loading, which is bad for the same reason (no control).
Because of no control we can come into situation when we persist the same aggregate few times, where it could be persisted only once, or the same aggregate is loaded one hundred times where it could be loaded once, hence performance is worse. Also there might be problem with stale data.
These reasons practically disqualifies ability to inject repository into aggregate.
Here comes my main question - why can we inject repositories into domain service then?
Not the same reasons applies here? It's just like moving logic out of aggregate into separate function and pretend it to be something different.
To be honest, when I stared to write this SO question, I had no good answer for that. But after hours of investigating this problem and writing of this question I came to solution. Rubber duck debugging.
I'll post this question anyway for others having the same problems. Of course with my answer below.

Here are the places where I'd recommend to fetch aggregates (i.e. call Repository.Get...()), in preference order :
Application Service
Domain Service
Aggregate
We don't want Aggregates to fetch other Aggregates most of the time, because this blurs the lines, giving them orchestration powers which normally belong to the Application layer. You also raise the risk of the Aggregate trespassing its jurisdiction by modifying other Aggregates, which can result in contention and performance problems, not to mention that transactions become more difficult to analyze and the code base to reason about.
Domain Services are IMO a good place to fetch Aggregates when determining which aggregates to modify is domain logic per se. In your game example (which might not be the ideal context for DDD by the way), which units are affected by another unit's attack might be considered domain logic, thus you may not want to place it at the Application Service level. This rarely happens in my experience though.
Finally, Application Services are the default place where I call Repository.Get(...) for uniformity's sake and because this is the natural place to get a hold of the actors of the use case (usually only one Aggregate per transaction) and orchestrate calls to them.
That doesn't mean Aggregates should never be injected Repositories, there are exceptions, but other alternatives are almost always better.

So as I wrote in a question, I've found my answer already in the process of writing that question.
The best way to show this is by example:
When we have a simple (superficially) behavior like unit attacking other unit, we can write something like that.
unit.attack_unit(other_unit)
Problem is that, to attack an unit, we have to calculate damage and to do that we need another aggregates, like weapon and armor, which are referenced by id inside of unit. Since we cannot inject repository inside of aggregate, then we have to move that attack_unit logic into domain service, because we can inject repository there. Now where is the difference between injecting it into domain service, and not into unit aggregate.
Answer is - there is no difference. All consequences I described in question won't bite us. In both cases we will load both units once, attacking unit weapon once and armor of unit being attacked once. Also there won't be stale data, even if we mutate weapon object during process and store it, because that weapon is retrieved and stored in one place.
Problem shows up in different example.
Lets create an use case where unit can attack all other units in game in one process.
Problem lies in how we implement it. If we will use already defined unit.attack_unit and we will call it on all units in game (iterating over them), then weapon that is used to compute damage will be retrieved from unit aggregate, number of times equal to count of units in game! But it could be retrieved only once!
It doesn't matter if unit.attack_unit will be method of unit aggregate, or if it will be domain service unit_attack_unit. It will be still the same, weapon will be loaded too many times. To fix that we simply have to change implementation and with that probably interface too.
Now at least we have an answer to question "does moving logic from aggregate method to domain service (because we want to access repository there) fixes problem?". No, it does not change a thing.
Injecting repositories into domain service can be as dangerous as injecting it into aggregate if used wrong.
This answers my SO question, but we still don't have solution to real problem.
What can we do if we have two use cases: one where unit attacks one other unit, and second where unit attacks all other units, without duplicating domain logic.
One way is to put all needed aggregates as parameters to our aggregate method.
unit.attack_unit(unit, weapon, armor)
But what if we will need like five or more aggregates there? It's not a good way. Also application logic will have to know that all these aggregates are needed for an attack, which is knowledge leak. When attack_unit implementation will change we would also might to update interface of that method. What is the purpose of encapsulation then?
So, if we can't access repository to get needed aggregate, how can we smuggle it then?
We can get rid of idea with referencing aggregates by ids, or pass all needed aggregates from application layer (which means knowledge leak).
Or maybe reason of these problems is bad modelling?
Attacking of other unit is indeed an unit responsibility, but is damage calculation its responsibility? Of course not.
Maybe we need another object, like value object MeleeAttack(weapon, armor), yet when we add more properties that can change result of an attack, like enchantments on unit, it gets more complicated.
Also I think that we are now creating objects based on performance, not our on domain.
So from domain driven design, we get performance driven design. Is that what we want? I don't think so.

"So why would we need unique identity if not for persisting?" - think of an account scenario, where several John Smiths exist in your system. Imagine John Smith and John Smith Jr (who didn't enter the Jr in signup) both live at the same address. How do you tell them apart? Imagine I'm trying to write a recommendation engine based upon their past purchases . . . .
Identity is a quality of equality in DDD. If you don't have an identity unique from your fields, then you're a ValueObject.

What are consequences of using repository inside of aggregate vs inside of domain service?
There's a reasonably strong argument that you shouldn't do either.
Riddle: when does an aggregate need to see the state of another aggregate?
The responsibility of an aggregate is to control change. Any command that would change the state of the domain model is dispatched to the aggregate root responsible for the integrity of the state in question. By definition, all of the state required to ensure that the command is currently permitted is contained within the aggregate boundary.
So there is never any need to peek at the data outside of the aggregate when making a change to the model.
In which case, you don't ever need to load another aggregate, which makes the "where" question moot.
Two clarifications:
Queries will often combine the state of multiple aggregates, and will often need to follow a reference from one aggregate to another. The principle above is satisfied because queries treat the domain model as read-only. You need the state to answer the query, but you don't need the invariant enforcement because you aren't changing anything.
Another case is when you need state from another aggregate to process a command properly, but small latency in the data is an acceptable risk to the data. In that case, you query the "other" aggregate to get state. If you were to run that query within the domain model itself, the right way to do so would be via a domain service.
In most cases, though, you'll be equally well served to run the query when generating the command (ie, in the client), or when handling the command (in the application, outside the domain). It would be very unusual for a business to consider domain service latency to be acceptable but client latency to be unacceptable.
(Disconnected clients are one case where that can be especially problematic; when the command is generated and then queued for a long period of time before being dispatched to the server).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string