DDD: where should logic go that tests the existence of an entity? - domain-driven-design

I am in the process of refactoring an application and am trying to figure out where certain logic should fit. For example, during the registration process I have to check if a user exists based upon their email address. As this requires testing if the user exists in the database it seems as if this logic should not be tied to the model as its existence is dictated by it being in the database.
However, I will have a method on the repository responsible for fetching the user by email, etc. This handles the part about retrieval of the user if they exist. From a use case perspective, registration seems to be a use case scenario and accordingly it seems there should be a UserService (application service) with a register method that would call the repository method and perform if then logic to determine if the user entity returned was null or not.
Am I on the right track with this approach, in terms of DDD? Am I viewing this scenario the wrong way and if so, how should I revise my thinking about this?
This link was provided as a possible solution, Where to check user email does not already exits?. It does help but it does not seem to close the loop on the issue. The thing I seem to be missing from this article would be who would be responsible for calling the CreateUserService, an application service or a method on the aggregate root where the CreateUserService object would be injected into the method along with any other relevant parameters?
If the answer is the application service that seems like you are loosing some encapsulation by taking the domain service out of the domain layer. On the other hand, going the other way would mean having to inject the repository into the domain service. Which of those two options would be preferable and more in line with DDD?

I think the best fit for that behaviour is a Domain Service. DS could access to persistence so you can check for existence or uniquenes.
Check this blog entry for more info.
I.e:
public class TransferManager
{
private readonly IEventStore _store;
private readonly IDomainServices _svc;
private readonly IDomainQueries _query;
private readonly ICommandResultMediator _result;
public TransferManager(IEventStore store, IDomainServices svc,IDomainQueries query,ICommandResultMediator result)
{
_store = store;
_svc = svc;
_query = query;
_result = result;
}
public void Execute(TransferMoney cmd)
{
//interacting with the Infrastructure
var accFrom = _query.GetAccountNumber(cmd.AccountFrom);
//Setup value objects
var debit=new Debit(cmd.Amount,accFrom);
//invoking Domain Services
var balance = _svc.CalculateAccountBalance(accFrom);
if (!_svc.CanAccountBeDebitted(balance, debit))
{
//return some error message using a mediator
//this approach works well inside monoliths where everything happens in the same process
_result.AddResult(cmd.Id, new CommandResult());
return;
}
//using the Aggregate and getting the business state change expressed as an event
var evnt = Transfer.Create(/* args */);
//storing the event
_store.Append(evnt);
//publish event if you want
}
}
from http://blog.sapiensworks.com/post/2016/08/19/DDD-Application-Services-Explained

The problem that you are facing is called Set based validation. There are a lot of articles describing the possible solutions. I will give here an extract from one of them (the context is CQRS but it can be applied to some degree to any DDD architecture):
1. Locking, Transactions and Database Constraints
Locking, transactions and database constraints are tried and tested tools for maintaining data integrity, but they come at a cost. Often the code/system is difficult to scale and can be complex to write and maintain. But they have the advantage of being well understood with plenty of examples to learn from. By implication, this approach is generally done using CRUD based operations. If you want to maintain the use of event sourcing then you can try a hybrid approach.
2. Hybrid Locking Field
You can adopt a locking field approach. Create a registry or lookup table in a standard database with a unique constraint. If you are unable to insert the row then you should abandon the command. Reserve the address before issuing the command. For these sort of operations, it is best to use a data store that isn’t eventually consistent and can guarantee the constraint (uniqueness in this case). Additional complexity is a clear downside of this approach, but less obvious is the problem of knowing when the operation is complete. Read side updates are often carried out in a different thread or process or even machine to the command and there could be many different operations happening.
3. Rely on the Eventually Consistent Read Model
To some this sounds like an oxymoron, however, it is a rather neat idea. Inconsistent things happen in systems all the time. Event sourcing allows you to handle these inconsistencies. Rather than throwing an exception and losing someone’s work all in the name of data consistency. Simply record the event and fix it later.
As an aside, how do you know a consistent database is consistent? It keeps no record of the failed operations users have tried to carry out. If I try to update a row in a table that has been updated since I read from it, then the chances are I’m going to lose that data. This gives the DBA an illusion of data consistency, but try to explain that to the exasperated user!
Accepting these things happen, and allowing the business to recover, can bring real competitive advantage. First, you can make the deliberate assumption these issues won’t occur, allowing you to deliver the system quicker/cheaper. Only if they do occur and only if it is of business value do you add features to compensate for the problem.
4. Re-examine the Domain Model
Let’s take a simplistic example to illustrate how a change in perspective may be all you need to resolve the issue. Essentially we have a problem checking for uniqueness or cardinality across aggregate roots because consistency is only enforced with the aggregate. An example could be a goalkeeper in a football team. A goalkeeper is a player. You can only have 1 goalkeeper per team on the pitch at any one time. A data-driven approach may have an ‘IsGoalKeeper’ flag on the player. If the goalkeeper is sent off and an outfield player goes in the goal, then you would need to remove the goalkeeper flag from the goalkeeper and add it to one of the outfield players. You would need constraints in place to ensure that assistant managers didn’t accidentally assign a different player resulting in 2 goalkeepers. In this scenario, we could model the IsGoalKeeper property on the Team, OutFieldPlayers or Game aggregate. This way, maintaining the cardinality becomes trivial.

You seems to be on the right way, the only stuff I didn't get is what your UserService.register does.
It should take all the values to register a user as input, validate them (using the repository to check the existence of the email) and, if the input is valid store the new User.
Problems can arise when the validation involve complex queries. In that case maybe you need to create a secondary store with special indexes suited for queries that you can't do with your domain model, so you will have to manage two different stores that can be out of sync (a user exists in one but it isn't replicated in the other one, yet).
This kind of problem happens when you store your aggregates in something like a key-value store where you can search just with the id of the aggregate, but if you are using something like a sql database that permits to search using your entities fields, you can do a lot of stuff with simple queries.
The only thing you need to take care is avoid to mix query logic and commands logic, in your example the lookup you need to do is easy, is just one field and the result is a boolean, sometimes it can be harder like time operations, or query spanning multiple tables aggregating results, in these cases it is better to make your (command) service use a (query) service, that offers a simple api to do the calculation like:
interface UserReportingService {
ComplexResult aComplexQuery(AComplexInput input);
}
That you can implement with a class that use your repositories, or an implementation that executes directly the query on your database (sql, or whatever).
The difference is that if you use the repositories you "think" in terms of your domain object, if you write directly the query you think in terms of your db abstractions (tables/sets in case of sql, documents in case of mongo, etc..). One or the other depends on the query you need to do.

It is fine to inject repository into domain.
Repository should have simple inteface, so that domain objects could use it as simple collection or storage. Repositories' main idea is to hide data access under simple and clear interface.
I don't see any problems in calling domain services from usecase. Usecase is suppossed to be archestrator. And domain services are actions. It is fine (and even unavoidable) to trigger domain actions by usecase.
To decide, you should analyze Where is this restriction come from?
Is it business rule? Or maybe user shouldn't be a part of model at all?
Usualy "User" means authorization and authentification i.e behaviour, that for my mind should placed in usecase. I prefare to create separate entity for domain (e.g. buyer) and relate it with usecase's user. So when new user is registered it possible to trigger creation of new buyer.

Related

How to handle hard aggregate-wide constraints in DDD/CQRS?

I'm new to DDD and I'm trying to model and implement a simple CRM system based on DDD, CQRS and event sourcing to get a feel for the paradigm. I have, however, run in to some difficulties that I'm not sure how to handle. I'm not sure if my difficulties stem from me not having modeled the domain properly or that I'm missing something else.
For a basic illustration of my problems, consider that my CRM system has the aggregate CustomerAggregate (which seems reasonble to me). The purpose of this aggregate is to make sure each customer is consistent and that its invarints hold up (name is required, social security number must be on the correkct format, etc.). So far, all is well.
When the system receives a command to create a new customer, however, it needs to make sure that the social security number of the new customer doesn't already exist (i.e. it must be unique across the system). This is, of cource, not an invariant that can be enforced by the CustomerAggregate aggregate since customers don't have any information regarding other customers.
One suggestion I've seen is to handle this kind of constraint in its own aggregate, e.g. SocialSecurityNumberUniqueAggregate. If the social security number is not already registered in the system, the SocialSecurityNumberUniqueAggregate publishes an event (e.g. SocialSecurityNumberOfNewCustomerWasUniqueEvent) which the CustomerAggregate subscribes to and publishes its own event in response to this (e.g. CustomerCreatedEvent). Does this make sense? How would the CustomerAggregate respond to, for example, a missing name or another hard constraint when responding to the SocialSecurityNumberOfNewCustomerWasUniqueEvent?
The search term you are looking for is set-validation.
Relational databases are really good at domain agnostic set validation, if you can fit the entire set into a single database.
But, that comes with a cost; designing your model that way restricts your options on what sorts of data storage you can use as your book of record, and it splits your "domain logic" into two different pieces.
Another common choice is to ignore the conflicts when you are running your domain logic (after all, what is the business value of this constraint?) but to instead monitor the persisted data looking for potential conflicts and escalate to a human being if there seems to be a problem.
You can combine the two (ex: check for possible duplicates via query when running the domain logic, and monitor the results later to mitigate against data races).
But if you need to maintain an invariant over a set, and you need that to be part of your write model (rather than separated out into your persistence layer), then you need to lock the entire set when making changes.
That could mean having a "registry of SSN assignments" that is an aggregate unto itself, and you have to start thinking about how much other customer data needs to be part of this aggregate, vs how much lives in a different aggregate accessible via a common identifier, with all of the possible complications that arise when your data set is controlled via different locks.
There's no rule that says all of the customer data needs to belong to a single "aggregate"; see Mauro Servienti's talk All Our Aggregates are Wrong. Trade offs abound.
One thing you want to be very cautious about in your modeling, is the risk of confusing data entry validation with domain logic. Unless you are writing domain models for the Social Security Administration, SSN assignments are not under your control. What your model has is a cached copy, and in this case potentially a corrupted copy.
Consider, for example, a data set that claims:
000-00-0000 is assigned to Alice
000-00-0000 is assigned to Bob
Clearly there's a conflict: both of those claims can't be true if the social security administration is maintaining unique assignments. But all else being equal, you can't tell which of these claims is correct. In particular, the suggestion that "the claim you happened to write down first must be the correct one" doesn't have a lot of logical support.
In cases like these, it often makes sense to hold off on an automated judgment, and instead kick the problem to a human being to deal with.
Although they are mechanically similar in a lot of ways, there are important differences between "the set of our identifier assignments should have no conflicts" and "the set of known third party identifier assignments should have no conflicts".
Do you also need to verify that the social security number (SSN) is really valid? Or are you just interested in verifying that no other customer aggregate with the same SSN can be created in your CRM system?
If the latter is the case I would suggest to have some CustomerService domain service which performs the whole SSN check by looking up the database (e.g. via a repository) and then creates the new customer aggregate (which again checks it's own invariants as you already mentioned). This whole process - the lookup of existing SSN and customer creation - needs to happen within one transaction to to ensure consistency. As I consider this domain logic a domain service is the perfect place for it. It does not hold data by itself but orchestrates the workflow which relates to business requirements - that no to customers with the same SSN must be created in our CRM.
If you also need to verify that the social security number is real you would also need to perform some call the another service I guess or keep some cached data of SSNs in your CRM. In this case you could additonally have some SocialSecurityNumberService domain service which is injected into the CustomerService. This would just be an interface in the domain layer but the implementation of this SocialSecurityNumberService interface would then reside in the infrastructure layer where the access to whatever resource required is implemented (be it a local cache you build in the background or some API call to another service).
Either way all your logic of creating the new customer would be in one place, the CustomerService domain service. Additional checks that go beyond the Customer aggregate boundaries would also be placed in this CustomerService.
Update
To also adhere to the nature of eventual consistency:
I guess as you go with event sourcing you and your business already accepted the eventual consistency nature. This also means entries with the same SSN could happen. I think you could have some background job which continually checks for duplicate entries and depending on the complexity of your business logic you might either be able to automatically correct the duplicates or you need human intervention to do it. It really depends how often this could really happen.
If a hard constraint is that this must NEVER happen maybe event sourcing is not the right way, at least for this part of your system...
Note: I also assume that command de-duplication is not the issue here but that you really have to deal with potentially different commands using the same SSN.

How can I design a bridge from a legacy CRUD oriented app to a CQRS and Event sourcing system?

I was asked to implement CQRS/Event sourcing patterns into a legacy web application, in order to prepare to migrate it from a monolithic/state oriented model to a distributed, service oriented app.
I have some questions on how I can design a Domain oriented code bundle that would connect the legacy entities strongly coupled to database, with a new Event sourced model.
The first things I did were:
writing a small "framework" for CQRS/ES, with classes like AggregateRoot, DomainEvent, Command, Handlers, Messaging, Eventstore, AggregateIds, etc.
trying to group and "migrate" the legacy Entities into some Aggregates to reconstruct all the history and states of the app into EventSoourced Aggregates
plug some Commands dispatching in the old controllers in order to let the app work as is, but also to feed the new CQRS/ES system on the side.
The context:
The legacy app contains several entities, mapped to database, that hold the model layer. (Our domain is Human resources (manpower).
Let's say we have those existing entities:
Worker, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
competences 1-N
Society, in which worker works, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
hours
Contract, with various fields and related entities (OneToOne, OneToMany), like
address 1-1
Worker 1-1
Society 1-1
documents 1-N
days 1-N
hours
etc.
From this legacy model, I designed a MissionAggregate that holds:
A db independent ID, like UUID
some Value objects: address, days (they were an entity in the legacy model, they became VOs here)
I also designed a WorkerAggregate and a SocietyAggregate, with fields and UUIDS, and in the MissionAggregate I added:
a reference to WorkerAggregate's UUID
a reference to SocietyAggregate's UUID
As I said earlier, my aim is to leave the legacy app as is, but just introduce in the CRUD controller's methods some calls to dispatch Commands to the new CQRS system.
For example:
After flushing newly created Contract in bdd, I want to dispatch a "CreateMissionCommand" to the new command bus.
It targets the appropriate Command Handler, that handles all the command's data, passes it to a newly created Aggregate with a new UUID and stores "MissionCreatedDomainEvent" in the EventStore.
The DomainEvent is indexed with an AggregateId, a playhead, and has a payload which contains the fields necessary to be applied to and build the MissionAggregate.
The newly Contract created in the app has now its former lifecycle, as usual, with all the updates that the legacy app does on it. But I also need to reflects all those changes to the corresponding EventSourcedAggregate, so every time there is a flush in database in the app, I dispatch a Command that translates the "crud like operations" of the legacy app into a Domain oriented /Command oriented pattern.
To sum up the workflow is:
A Crud legacy operation occurs and flushes some changes on the Contract Entity
In just a row of code in the controller, I dispatch a command built with necessary fields (AggregateId of the MissionAggregate... that I need to have stored somewhere... see next problems) to the Domain command bus, so that the impact on the existing code base is very low.
The bus passes the command to the corresponding command handler
The handler loads the aggregate and applies the changes it by calling the appropriate Aggregate method
then after some validation, the aggregate raises and stores the appropriate event
My problems and questions (some of them at least) are:
I feel like I am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that I have between the Entities, and with the same type of validations, checks etc.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that I have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't I have only references to Worker's entity id and Society's entity id?
How can I avoid having a eternally growing MissionAggregate? The Contract Entity is quite huge, it has a lot of fields that are constantly updated (hours, days, documents, etc.) If I want to store all those events, I need to have a large MissionAggregate to reflect all those changes; and so I need to have a tons of CommandHandlers that react to all the Commands of add, update, etc. that I am going to dispatch from the legacy app.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when I want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation? In the Entity itself, in a AggregateId field? in the Aggregate, should I have a ContractId field? Or should I have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
What to do with the past? Should I migrate all the existing data through a script that generates Aggregates and events on all the historical data?
Thanks in advance for your time.
You have a huge task ahead of you, let's try to break it down.
It's best to build this new part of the system in isolation from the legacy codebase, otherwise you're going to have your hands tied in every turn of the way.
Create a separate layer in your project for these new requirements. We're going to call it "bubble" from now on. This bubble will be like a greenfield project, with its own structure, dependencies, etc. There will be no direct communication between the bubble and the legacy; communication will happen through another dedicated translation layer, which we'll call "Anti-Corruption Layer" (ACL).
ACL
It is like an API between two systems.
It translates calls from the bubble to the legacy and vice-versa. Its purpose is to prevent one system from corrupting or influencing the other. This way you can keep building/maintaining each system independently from each other.
At the same time, the ACL allows one system to consume the other, and reuse logic, validations, rules, etc.
To answer your questions directly:
I feel like i am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that i have between the Entities, and with the same type of validations, checks etc.
With the ACL, you can resort to calling validations and reuse implementations from the legacy code. This will allow you time to rewrite things as needed or as possible.
You may not need to rewrite the entire system, though. If your goal is to implement CQRS and Event Sourcing and you can achieve this goal by keeping most or part of the legacy system, I would say you do it. Unless, of course, one of the goals is to completely replace the old system. Otherwise, keep it; write as less code as possible.
Suggested workflow:
Keep the CQRS and Event Sourcing system in the bubble
Do not bring these new frameworks into legacy
Make the lagacy Controller issue method calls to the ACL
The ACL will convert these calls into Commands and dispatch them
Any events will be caught by your Event Sourcing framework
Results will be persisted to the bubble's database
The bubble's database can be a different schema in the same database or can be a different database altogether. But you'll have to think about synchronization, and that's a topic of its own. To reduce complexity, I recommend a different schema in the same database.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that i have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't i have only references to Worker's entity id and Society's entity id?
How can i avoid having a eternally growing MissionAggregate ? The Contract Entity is quite huge, it has a looot of fields that are constantly updated (hours, days, documents, etc.) If i want to store all those events, i need to have a large MissionAggregate to reflect all those changes; and so i need to have a tons of CommandHandlers that react to all the Commands of add, update, etc that i am going to dispatch from the legacy app.
You should aim for small aggregates. Huge aggregates are likely to degrade performance and cause concurrency problems.
If you anticipate having a huge aggregate, it is best to rethink it and try to break it down. Ask what fields/properties change together - these are possibly a different aggregate.
Also, when you speak about CQRS, you generally lean towards a task-based way of doing things in your system.
Think of a traditional web application, where you have a huge page with lots of fields that are all sent to the server in one batch when the user saves.
Now, contrast it with a modern web app where the user changes small portions of data at each step. If you think about your system this way you'll find those smaller aggregates.
PS. you don't need to rebuild your interfaces for this. If your legacy system has those huge pages, you could have logic in the controllers to detect which fields were changed and issue the appropriate commands.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when i want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation ? In the Entity itself, in a AggregateId field ? in the Aggregate, should i have a ContratId field ? Or should i have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
Aggregates represent a conceptual whole. They are like atoms, indivisible things. You should always refer to an aggregate by its Root Entity Id, and never to a Child Entity Id: looking from the outside, there are no children.
An aggregate should be loaded as a whole and persisted as a whole. One more reason to have small aggregates.
An aggregate can be comprised of a single entity. Or it can have more entities and value objects, forming a graph, but one entity will be elected as the Root and will hold references to its children. Child entities and value objects should not hold references to their parents. The dependency is not bi-directional.
If Contract is an entity inside the Mission aggregate, the Contract should not have a reference to its parent.
But, if your Contract and Mission are different aggregates, then they can reference each other by their Ids.
What to do with the past? Should i migrate all the existing datas through a script that generates Aggregates and events on all the historical data?
That's a question for the business experts. Do they need it? If they don't, then don't implement it just for the sake of doing so. Every decision you make should be geared towards satisfying a business need and generating real value for it, considering the costs and tradeoffs.
Some people say that code is a liability, not an asset, and I aggre to some extent: every line of code you write needs to be tested and supported. Don't write any code that is not really necessary.
Also, have a look at this article about the Strangler Pattern, which shows how to migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.
If you have a chance, watch this course at Pluralsight (paid): Domain-Driven Design: Working with Legacy Projects. The author presents practical approaches for dealing with this kind of task.
I hope this has given you some insight.
I don't want to spoil your game. Everybody knows how cool it is to rewrite something from scratch. It's a challenge, it's fun, it's exciting. However...
migrate it from a monolithic/state oriented model to a distributed, service oriented app
CQRS/Event Sourcing won't solve any of your problems and it won't help you distribute the app in any reasonable way. If you just generate events on the CRUD operations you'll have a large tangled mess of dependencies between each part. Every part that needs data will have to call a couple of "services" (i.e. tables) to get it, than push data elsewhere, generate events1 that some other parts will react to. It will be a mess. Usually this is called a distributed monolith.
This is also the reason you already see problems with it. These problems won't go away, because you are essentially building the same system in the same way, but this time it'll be more complex.
Where to go from here
The very first thing is always: have a clear goal. You want a service oriented architecture you said. Why? Are there parts that need different scaling, different resources? Are they managed by different teams with different life-cycles? Etc.? Maybe you already have all this, I don't know, but if not, that's your first task.
Then. The parts you do want to pull out can't be just CRUD things. Those will not be independent, so whether your goal (see point above!) is scaling or different team, you won't reach your goal! To be independent you'll have to pull out the behavior with the data, and in a way that the service can operate on its own.
You can't just throw buzzwords at it and hope for the best. I'd suggest to just ignore all the hype and buzzwords and think about the goal you want to reach.
For example: I need a million workers to log their time in under 10 minutes total. So that means I need a "service" to enable worker to log their time with a web interface. So let's create that as a complete independent piece with its own database so it can be scaled to a 100 nodes when it needs to be. Export data to billing automatically every hour or so.

What are consequences of using repository inside of aggregate vs inside of domain service

We all heard that injecting repository into aggregate is a bad idea, but almost no one tells why.
I will try to write here all disadvantages of doing this, so we can measure rightness of this statement.
First thing that comes into my head is Single Responsibility Principle.
It's true that by injecting repository into AR we are violating SRP, because retrieving and persisting of aggregate is not responsibility of aggregate itself. But it says only about "aggregate itself", not about other aggregates. So does it apply for retrieving from repository aggregates referenced by id? And what about storing them?
I used to think that aggregate shouldn't even know that there is some sort of persistence in system, because it doesn't have to exist. Aggregates can be created just for one procedure call and then get rid of.
Now when I think of it, it's not right, because aggregate root is an entity, and entity has sense only if it has some unique identity. So why would we need unique identity if not for persisting? Even if it's just a persistence in a memory. Maybe for comparing, but in my opinion it's not a main reason behind the identity.
Ok, let's assume that we retrieve and store OTHER aggregates from inside of our aggregate using injected repositories. What are other consequences beside SRP violation?
For sure there is a problem with having no control over persisting of aggregates and retrieving is some kind of lazy loading, which is bad for the same reason (no control).
Because of no control we can come into situation when we persist the same aggregate few times, where it could be persisted only once, or the same aggregate is loaded one hundred times where it could be loaded once, hence performance is worse. Also there might be problem with stale data.
These reasons practically disqualifies ability to inject repository into aggregate.
Here comes my main question - why can we inject repositories into domain service then?
Not the same reasons applies here? It's just like moving logic out of aggregate into separate function and pretend it to be something different.
To be honest, when I stared to write this SO question, I had no good answer for that. But after hours of investigating this problem and writing of this question I came to solution. Rubber duck debugging.
I'll post this question anyway for others having the same problems. Of course with my answer below.
Here are the places where I'd recommend to fetch aggregates (i.e. call Repository.Get...()), in preference order :
Application Service
Domain Service
Aggregate
We don't want Aggregates to fetch other Aggregates most of the time, because this blurs the lines, giving them orchestration powers which normally belong to the Application layer. You also raise the risk of the Aggregate trespassing its jurisdiction by modifying other Aggregates, which can result in contention and performance problems, not to mention that transactions become more difficult to analyze and the code base to reason about.
Domain Services are IMO a good place to fetch Aggregates when determining which aggregates to modify is domain logic per se. In your game example (which might not be the ideal context for DDD by the way), which units are affected by another unit's attack might be considered domain logic, thus you may not want to place it at the Application Service level. This rarely happens in my experience though.
Finally, Application Services are the default place where I call Repository.Get(...) for uniformity's sake and because this is the natural place to get a hold of the actors of the use case (usually only one Aggregate per transaction) and orchestrate calls to them.
That doesn't mean Aggregates should never be injected Repositories, there are exceptions, but other alternatives are almost always better.
So as I wrote in a question, I've found my answer already in the process of writing that question.
The best way to show this is by example:
When we have a simple (superficially) behavior like unit attacking other unit, we can write something like that.
unit.attack_unit(other_unit)
Problem is that, to attack an unit, we have to calculate damage and to do that we need another aggregates, like weapon and armor, which are referenced by id inside of unit. Since we cannot inject repository inside of aggregate, then we have to move that attack_unit logic into domain service, because we can inject repository there. Now where is the difference between injecting it into domain service, and not into unit aggregate.
Answer is - there is no difference. All consequences I described in question won't bite us. In both cases we will load both units once, attacking unit weapon once and armor of unit being attacked once. Also there won't be stale data, even if we mutate weapon object during process and store it, because that weapon is retrieved and stored in one place.
Problem shows up in different example.
Lets create an use case where unit can attack all other units in game in one process.
Problem lies in how we implement it. If we will use already defined unit.attack_unit and we will call it on all units in game (iterating over them), then weapon that is used to compute damage will be retrieved from unit aggregate, number of times equal to count of units in game! But it could be retrieved only once!
It doesn't matter if unit.attack_unit will be method of unit aggregate, or if it will be domain service unit_attack_unit. It will be still the same, weapon will be loaded too many times. To fix that we simply have to change implementation and with that probably interface too.
Now at least we have an answer to question "does moving logic from aggregate method to domain service (because we want to access repository there) fixes problem?". No, it does not change a thing.
Injecting repositories into domain service can be as dangerous as injecting it into aggregate if used wrong.
This answers my SO question, but we still don't have solution to real problem.
What can we do if we have two use cases: one where unit attacks one other unit, and second where unit attacks all other units, without duplicating domain logic.
One way is to put all needed aggregates as parameters to our aggregate method.
unit.attack_unit(unit, weapon, armor)
But what if we will need like five or more aggregates there? It's not a good way. Also application logic will have to know that all these aggregates are needed for an attack, which is knowledge leak. When attack_unit implementation will change we would also might to update interface of that method. What is the purpose of encapsulation then?
So, if we can't access repository to get needed aggregate, how can we smuggle it then?
We can get rid of idea with referencing aggregates by ids, or pass all needed aggregates from application layer (which means knowledge leak).
Or maybe reason of these problems is bad modelling?
Attacking of other unit is indeed an unit responsibility, but is damage calculation its responsibility? Of course not.
Maybe we need another object, like value object MeleeAttack(weapon, armor), yet when we add more properties that can change result of an attack, like enchantments on unit, it gets more complicated.
Also I think that we are now creating objects based on performance, not our on domain.
So from domain driven design, we get performance driven design. Is that what we want? I don't think so.
"So why would we need unique identity if not for persisting?" - think of an account scenario, where several John Smiths exist in your system. Imagine John Smith and John Smith Jr (who didn't enter the Jr in signup) both live at the same address. How do you tell them apart? Imagine I'm trying to write a recommendation engine based upon their past purchases . . . .
Identity is a quality of equality in DDD. If you don't have an identity unique from your fields, then you're a ValueObject.
What are consequences of using repository inside of aggregate vs inside of domain service?
There's a reasonably strong argument that you shouldn't do either.
Riddle: when does an aggregate need to see the state of another aggregate?
The responsibility of an aggregate is to control change. Any command that would change the state of the domain model is dispatched to the aggregate root responsible for the integrity of the state in question. By definition, all of the state required to ensure that the command is currently permitted is contained within the aggregate boundary.
So there is never any need to peek at the data outside of the aggregate when making a change to the model.
In which case, you don't ever need to load another aggregate, which makes the "where" question moot.
Two clarifications:
Queries will often combine the state of multiple aggregates, and will often need to follow a reference from one aggregate to another. The principle above is satisfied because queries treat the domain model as read-only. You need the state to answer the query, but you don't need the invariant enforcement because you aren't changing anything.
Another case is when you need state from another aggregate to process a command properly, but small latency in the data is an acceptable risk to the data. In that case, you query the "other" aggregate to get state. If you were to run that query within the domain model itself, the right way to do so would be via a domain service.
In most cases, though, you'll be equally well served to run the query when generating the command (ie, in the client), or when handling the command (in the application, outside the domain). It would be very unusual for a business to consider domain service latency to be acceptable but client latency to be unacceptable.
(Disconnected clients are one case where that can be especially problematic; when the command is generated and then queued for a long period of time before being dispatched to the server).

CQRS or App Service?

So I like the concepts of CQRS in our application, mainly because we already support event sourcing (conceptually, not following any prescriptions that you see out there). However, it really seems like CQRS is geared toward Big Data, Eventual consistency, that kind of thing. We are always going to be a Relational DB app, so I am not sure if it fits.
I also have concerns because I think I need to do some special things in my app layer. When doing a read, I need to enforce security and filter data, things that are traditionally implemented in the application layer.
My first question is, does my app fit (a traditional MVC / Relational DB app)? Or does it make more sense to have a traditional app layer and use a DTO Mapper?
My second question is, does it make sense to issue commands to your domain model out of a traditional application layer? I like the idea of commands / command handlers and eventing.
Let me clarify my question. I have concerns around data filtering that are tied to authorization. When a user requests data, there has to be a filter that restricts access to certain data elements by removing them all together (So they are not returned to the caller), hiding the values, or applying masks to the data. In a contrived example, for a Social Security Number, the user making the request may only be able to see the last 4 numbers, so the result would appear like ###-##-1234.
My assertion is that this responsibility goes in the Application layer. I consider this an aspect, where all responses to queries or commands have to go through this filter mechanism. Here is where my CQRS naivity shines through, perhaps it is that commands never return data, just pointers to data that are looked up through the read model?
Thanks!
First and foremost: CQRS and Relational Databases don't exclude each other. In advanced scenarios it may make sense to replace the SQL-based DB with other means of storage, but CQRS as a concept doesn't care about the persistence mechanism.
In the case of multiple views that depend on roles and/or users, the Thin Read Layer should probably provide multiple result sets:
One containing the full SSN for users that are authorized to access that information.
A different one for users that are not authorized to see that information
...
These can be stored in a separate data store, but they can also be provided through SQL views if you work with a single SQL-based database.
In CQRS the Application Service still exists in the form of Command Handlers. These can be nested, i.e. handling authorization first, then posting the command to a contained command handler.
public class AuthorizationHandler {
public CrmAuthorizationService(CrmCommandHandler handler) {
_next = handler;
}
public void Handle(SomeCommand c) {
if (authorized) _next.Handle(c);
}
}
// Usage:
var handler = new CrmAuthorizationService(new CrmCommandHandler());
bus.Register<SomeCommand>(handler.Handle);
This way you can nest multiple handlers, e.g. as a REST envelope, for logging, transactions, etc.
To answer your questions:
First: Is CQRS a good fit for your app? No-one can tell without really digging into the specific requirements. Just because you use MVC and a relational DB doesn't mean anything when it comes to the pros and cons of CQRS.
Second: Yes, in some cases it can make sense to let your application layer interact with the client in a classical way and handle things like authentication, authorization, etc., and then issue commands internally. This can be useful when putting an MVC-based UI or a REST API on top of your application.
Update in response to comment:
In an ideal, puristic CQRS scenario Sally would have her own denormalized data for every view, e.g. a couple of documents in a NoSQL DB called CustomerListForSally, CustomerDetailsForSally, etc. These are populated with what she's allowed to see.
Once she gets promoted - which would be an important domain event - all her denormalized data would automatically be overwritten, and extended to contain what she's allowed to see now.
Of course we must stay reasonable and pragmatic, but this ideal should be the general direction that we're heading for.
In reality you probably have some kind of user/role or user/group based system. To be able to view sensitive information, you'd have to be member of a particular role or group. Each of these could have their defined set of views and commands. This doesn't need t be denoalized data It can be as simpe as a cope of SQL-Views:
CustomerDetailsForSupportStaff
CustomerDetailsForSupportExecutive with unmasked SSN
CustomerListForSupportStaff
CustomerListForSupportExecutive with customer revenue totals
etc.

Implementing Udi's Fetching Strategy - How do I search?

Background
Udi Dahan suggests a fetching strategy as a useful pattern to use for data access. I agree.
The concept is to make roles explicit. For example I have an Aggregate Root - Customer. I want customer in several parts of my application - a list of customers to select from, a view of the customer's details, and I want a button to deactivate a customer.
It seems Udi would suggest an interface for each of these roles. So I have ICustomerInList with very basic details, ICustomerDetail which includes the latest 10 products purchased, and IDeactivateCustomer which has a method to deactivate the customer. Each interface exposes just enough of my Customer Aggregate Root to get the job done in each situation. My Customer Aggregate Root implements all these interfaces.
Now I want to implement a fetching strategy for each of these roles. Each strategy can load a different amount of data into my Aggregate Root because it will be behind an interface exposing only the bits of information needed.
The general method to implement this part is to ask a Service Locator or some other style of dependency injection. This code will take the interface you are wanting, for example ICustomerInList, and find a fetching strategy to load it (IStrategyForFetching<ICustomerInList>). This strategy is implemented by a class that knows to only load a Customer with the bits of information needed for the ICustomerInList interface.
So far so good.
Question
What you pass to the Service Locator, or the IStrategyForFetching<ICustomerInList>. All of the examples I see are only selecting one object by a known id. This case is easy, the calling code passes this id through and will get back the specific interface.
What if I want to search? Or I want page 2 of the list of customers? Now I want to pass in more terms that the Fetching Strategy needs.
Possible solutions
Some of the examples I've seen use a predicate - an expression that returns true or false if a particular Aggregate Root should be part of the result set. This works fine for conditions but what about getting back the first n customers and no more? Or getting page 2 of the search results? Or how the results are sorted?
My first reaction is to start adding generic parameters to my IStrategyForFetching<ICustomerInList> It now becomes IStrategyForFetching<TAggregateRoot, TStrategyForSelecting, TStrategyForOrdering>. This quickly becomes complex and ugly. It's further complicated by different repositories. Some repositories only supply data when using a particular strategy for selecting, some only certain types of ordering. I would like to have the flexibility to implement general repositories that can take sorting functions along with specialised repositories that only return Aggregate Roots sorted in a particular fashion.
It sounds like I should apply the same pattern used at the start - How do I make roles explicit? Should I implement a strategy for fetching X (Aggregate Root) using the payload Y (search / ordering parameters)?
Edit (2012-03-05)
This is all still valid if I'm not returning the Aggregate Root each time. If each interface is implemented by a different DTO I can still use IStrategyForFetching. This is why this pattern is powerful - what does the fetching and what is returned doesn't have to map in any way to the aggregate root.
I've ended up using IStrategyForFetching<TEntity, TSpecification>. TEntity is the thing I want to get, TSpecification is how I want to get it.
Have you come across CQRS? Udi is a big proponent of it, and its purpose is to solve this exact issue.
The concept in its most basic form is to separate the domain model from querying. This means that the domain model only comes into play when you want to execute a command / commit a transaction. You don't use data from your aggregates & entities to display information on the screen. Instead, you create a separate data access service (or bunch of them) that contain methods that provide the exact data required for each screen. These methods can accept criteria objects as parameters and therefore do searching with whatever criteria you desire.
A quick sequence of how this works:
A screen shows a list of customers that have made orders in the last week.
The UI calls the CustomerQueryService passing a date as criteria.
The CustomerQueryService executes a query that returns only the fields required for this screen, including the aggregate id of each customer.
The user chooses a customer in the list, and chooses perform the 'Make Important Customer' action /command.
The UI sends a MakeImportantCommand to the Command Service (or Application Service in DDD terms) containing the ID of the customer.
The command service fetches the Customer aggregate from the repository using the ID passed in the command, calls the necessary methods and updates the database.
Building your app using the CQRS architecture opens you up to lot of possibilities regarding performance and scalability. You can take this simple example further by creating separate query databases that contain denormalised tables for every view, eventual consistency & event sourcing. There is a lot of videos/examples/blogs about CQRS that I think would really interest you.
I know your question was regarding 'fetching strategy' but I notice that he wrote this article in 2007, and it's likely that he considers CQRS its sucessor.
To summarise my answer:
Don't try and project cut down DTO's from your domain aggregates. Instead, just create separate query services that give you a tailored query for your needs.
Read up on CQRS (if you haven't already).
To add to the response by David Masters, I think all the fetching strategy interfaces are adding needless complexity. Having the Customer AR implement the various interfaces which are modeled after a UI is a needless constraint on the AR class and you will spend far to much effort trying to enforce it. Moreover, it is a brittle solution. What if a view requires data that while related to Customer, does not belong on the customer class? Does one then coerce the customer class and the corresponding ORM mappings to contain that data? Why not just have a separate set of classes for query purposes and be done with it? This allows you to deal with fetching strategies at the place where they belong - in the repository. Furthermore, what value does the fetching strategy interface abstraction really add? It may be an appropriate model of what is happening in the application, it doesn't help in implementing it.

Resources