Domen driven architecture and user typos/errors

Domen driven architecture and user typos/errors - domain-driven-design

DDD teaches us to build our classes like their real-world prototypes.
So instead of using setters
job = new Job
job.person = person
job.since = time.Now()
job.title = title
we define well-named methods in our aggregation root
job = person.promote(title, /** since=time.Now() **/)
Now the tricky part
Assume we have an UI for an HR where he/she enters a new title via the HTML form and makes a typo like "prgrammer" (Of course in real application there'd be a select list, but here we have a text input), or selects a wrong date (like default today)
Now we have a problem. There are no typos in real world. Our John Doe is definitely a "programmer" and never a "prgrammer"
How do we fix this typo in our domain model?
Our Person has just promote, demote, fire, etc. methods, which reflect the HR domain model.
We could cheat a little bit and change the Job record directly, but now we have a Job.setTitle method, that doesn't reflect our domain model and also, setters are evil, you know.
That may look a little "academic", but that really bugs me when I try to build a good domain model for a complex application

Another side of DDD is invariants - "always valid" entity. And when you try to break this invariant (some rule) you must stop execution and say "loudly" adout this (throw exception). So, you need to have a list of valid titles and when you try to change title (does not matter how) to invalid state, you must throw some usefull exception.
To "fix" typo situations you must separate operations in your domain promote is one operation (it may check something, sent contratulation email :) and so on). And edit operation - just to edit some properties. So, the differenece is in logic of operations. You can't call promote without some preconditions (for example, required experience of worker), but you can call edit and fix worker's name because of type.
And usually this operations are separated between different users: only HR's can promote but a worker can edit his name, if it's wrong.
This solution is very complicated for such example, but it's always with DDD.
The main concept - separate operations. Each one with their own conditions, permissions, rules.
A question about invariants (rules).

If a client is purely entering data, then the underlying domain in this (bounded) context is not very deep. In these cases, it's fine to use a CRUD style application and allow titles to be changed (setTitle()).
Just make sure dependent BCs (e.g., billing, vacation planning, ...) where no such thing as "invalid data" exists, can react to changes in your CRUD context appropriately.

The application should enforce input correctness before it reaches the domain layer, no garbage input. If that means using a dropdown for the job titles then so be it. You can validate the title against existing titles.
In my company of 18 thousand employees, typo happens all the time. You are going to have to be pragmatic about this and accept that there will be setters in your code (one way or another)
Pragmatic thinking is very much at the core of the domain driven design, and is what keep things simple.
"Purity is good in theory, but in practice it can be very difficult to achieve, and sometimes you must choose the pragmatic approach" - Patterns, Principles, and Practices of Domain-Driven Design (2015)

"There are no typos in real world", I get what you mean, but that's not true, there are human mistakes in real world scenarios and they should be accounted for in your domain if they are frequent.
If data entry errors aren't frequent it may not be worth the extra modeling efforts and those could perhaps just get fixed directly in the DB. It also depends if the business wishes to learn something about those mistakes or not.
However, if data entry errors are frequent, it might be an indicator that the system is perhaps not offering enough guidance and the business may wish to learn more about those errors in order to make processes more efficient and less error-prone.
You may wish to implement an operation such as job.correctTitle(...), perhaps in a BC dedicated to data corrections? Also, it's probably very rare that each and every piece of information will be erroneous so corrective operations can be segregated. That means you probably do not need a job.correctAllInformation(...) kind of operation.
This whole scenario is very fictive since job titles would usually be managed in a separate BC from where they are used and they would probably get picked from a list, therefore typos would be less frequent, but you will always have to deal with data entry errors. Choosing the appropriate solution is not always easy and will vary from case to case, but try to stay pragmatic and not strive for the perfect model in every sphere of your domain.

Related

Should Multiple Actors share the same Goals in Actor-Goal list

Craig Larman states that creating an Actor[/User]-Goal list in form of some table/grid is a good technique for finding Use Cases during Requirements Analysis. (Applying UML and Patterns - P. 69 ff)
Some simple two-column table should be enough to provide a good Overview for this example; imagine following Actor-Goal List:
Actor Goal
Admin Create User
" Read User
" .. (full CRUD)
" CRUD Entry
" Assign Entry (to User)
" ..
User Create Entry
" .. (full CRUD)
" CRUD himself?
" ..
Admin can do what User can + more like managing the Users of the System under Development or assigning Entries to them.
Admin and User clearly are sharing some Goals(Can we use the term Use Case yet?).
Im not really sure where to go from here in terms of refining this Actor-Goal list.
My brain tells me that i can spare time and effort by reusing/abstraction here, so I will most likely end up with one common superclass implementing the CRUD Entry behaviour, where Admin is extending the functionality by the Manage Goals(CRUD User, assign, etc.).
But I know that this is rather a question of Design than Analysis.
I also know that I can write the Use Case for this in isolation: I dont have to state who exactly uses it, I just need to know that it's some entity which is adhering to the given contract[/interface].
When is the time to start thinking about abstraction?
Am I overcomplicating things by doing so right now?
Should We leave the Actor-Goal list like above and check it off as a "complete" Artifact?
Since the classical purpose of an Actor-Goal list is to provide some quick overview for our next Artifact - the Use Case Diagram - could we begin the transition right here?:
The Use Case Diagram makes the whole reusing part much more visible (at least to me). Would it be advisable to adopt the redundancy right now and take care of it during later stages (e.g. design)?
Appreciate your input!
EDIT: Also I'm not quite sure about the User CRUDing himself.. But lets keep things simple and stick to the main question.

It's an excellent idea to identify the candidate use cases in an actor-goal list, as you have described.
The issues when identifying use cases
In the real life, you'll quickly encounter consistency issues when elaborating the list during requirement elicitation :
some interviewed users/business experts will describe very detailed step by step goals (corresponding to system functions), whereas other will describe rather high level goals user goals. So you'll need a third column to identify the goal level of each use case.
terminology and user goals will not always be expressed in an homogeneous way. So cross checking, and renaming of use cases might be required. For example, I had a system where:
an admin claimed to manage authorizations, and a business user claimed to manage authorization. Was it the same use case ? No: it appeared that the first maintained assignment of authorizations in the system and the second was empowered to decide on authorizations assignment and request them from the admin.
a purchaser explained that for a purchase order someone has to register a good receipt before the invoice is paid. The warehouse clerk explained that at warehouse they manage stock movement. Latter, though, it appeared, that those movements where good receipts, good issues, and stock transfers.
So a fourth column for comments about main variants could help to keep the overview, and to spot hidden sharing potential.
There is generally a great deal of cross checking and harmonization to be done before sharing/reusing use cases. Reusing too quickly might end up loosing more time than expected.
The use case diagramming
I will now be very provocative: once you have the nice and consistent table with all the actors and use cases, what will be the benefit that you expect from a use case diagram ?
It is generally recommended not to abuse use case diagrams for functional decomposition (see also here). So the use case diagram will add little more to what you already have in the list.
In addition, <<Include>> , <<Extend>> and generalization relations should be used scarcely because they tend to quickly make the diagram difficult to understand.
Finally, does abstraction and reuse really happen at the level of the use case ? Is this relevant to the actors outside of the system ? If not, it's more about design and implementation details. So I'd suggest to consider these more in the class model that you will create (or derive) to implement the use cases.

UML Domain model - how to model multiple roles of association between two entities?

Suppose there is a scenario of Users having Tasks. Each User can either be a Watcher or Worker of a Task.
Furthermore, a Worker can file the hours he has worked on a given Task.
Would the following diagram be correct? I have looked around at domain models and I have not seen one with the two associations (works on, watches). Is it acceptable?
EDIT: What about this scenario? An User can make an Offer to another user. A possible way of modelling it is shown on the following diagram.
However, in that diagram it would seem possible for a user to make the offer to himself. Is it possible to model some constraints in, or is this handled further down the development line?

It is in principle correct, this is how you model multiple relationships between two classes.
As for the constraints, UML makes use of OCL (Object Constraint Language), so you can say that the associations are exclusive (xor - exclusive or).
Also note that it is generally a good idea to name the end roles of the associations.
Regarding one of the comments saying
There are no UML police to flag you down for being "unacceptable".
It is like saying: there's no code police to flag you down for writing shitty code.
You create diagrams to convey information (for school project anyway), if you diverge from standards or best practices you make it harder for other people to understand your diagrams.
And just like there are linters (jslint, ...) that check your code for common problems, for models there are model validations that do the same thing.
Also models, just like code, aren't set in stone so don't be afraid to modify them when you find a better way to express your domain.
Update
As Jim aptly pointed out, you usually do stuff not as a User (or Person), but as a Role. E.g. when you are student and you are filling a form, nobody cares that you are human, but that you are a Student. Typically a Person will also have several different roles (you could be a Student and a TA, a Professor, etc.)
Separating it in this way makes the domain much clearer as you are concerned only with the Roles, and not the people implementing them.

On the first model, there's not much to add after Peter's excellent and very interesting answer.
However your second diagram (in the edit section) does seem to give a true and fair view of the narative. From the strict 1 to 1 relationships, I'd understand that every user makes exactly one offer to one other user, and every user receives exactly one offer from another user:
"A user can make an offer to another user" implies a cardinality of 0..1 or *, not 1.
From this we understand implicitly that not all users need to receive an offer, i.e. also a cardinality of 0..1 or *
This can be debated, but "an offer" doesn't in my opinion mean at "at most one offer", so the upper bounds of cardinality shouldn't be * and not 1 to show that every user may make several offers and receive several offers.
As you can see, you can add constraints to the schema to increase expressivity. This is done in an annotation between { } . But you can choose the most suitable syntax. Here an example with the full expressivity of natural language (as suggested by Martin Fowler in his "UML distilled") but you can of course use the more formal OCL, making it {self.offerer<>self.offeree}.

I see #Peter updated his answer before I could post this answer, but I will post this anyway to show you some other tricks.
In general, it is perfectly valid to have multiple associations between the same two classes. However, I don't think it's a good idea here.
You say you want to build a [problem] domain model. I'm glad to hear that! A problem domain model is very important, as I explain in the last paragraph here. One thing to point out is that you want to build a durable model that transcends a system you can conceive. In the problem domain, there is no "User". There are, however, Roles that People play. For example, you mentioned Watcher and Worker. Hopefully these are concepts your customer already has in the "meat world".
In the model you posted, you have no place to hang the hours worked or progress made. You might try an exercise. If you had no computer, how would your customer track these things? They often have (or had) a way, and it was probably pretty optimal for a manual system.
Here is how I would model my understanding of your problem domain:
Some things to note:
A Person plays any number of Roles.
A Role is with respect to exactly one Task and one Person. A Watcher watches exactly one Task, a Worker is assigned to exactly one Task, and any kind of Role is played by exactly one Person.
A Role is abstract, and its subclasses are {complete}, meaning there can be no valid instance of a Role without also being an instance of a subclass.
A Task is watched by any number of Watchers and may be assigned to one Worker at a time. There is a constraint saying {person is not a watcher}. (You can show the OCL for that, but few people will understand it.)
I have added a Progress concept as a way of logging progress on a Task. A Worker makes Progress on one Task. Each bit of Progress has a Description and a Duration. Note that I have not committed to any computational way of representing a Description or a Duration. The designer of a system for this will be free to derive a Duration from start and end times, or ask the user to self-report.

What are consequences of using repository inside of aggregate vs inside of domain service

We all heard that injecting repository into aggregate is a bad idea, but almost no one tells why.
I will try to write here all disadvantages of doing this, so we can measure rightness of this statement.
First thing that comes into my head is Single Responsibility Principle.
It's true that by injecting repository into AR we are violating SRP, because retrieving and persisting of aggregate is not responsibility of aggregate itself. But it says only about "aggregate itself", not about other aggregates. So does it apply for retrieving from repository aggregates referenced by id? And what about storing them?
I used to think that aggregate shouldn't even know that there is some sort of persistence in system, because it doesn't have to exist. Aggregates can be created just for one procedure call and then get rid of.
Now when I think of it, it's not right, because aggregate root is an entity, and entity has sense only if it has some unique identity. So why would we need unique identity if not for persisting? Even if it's just a persistence in a memory. Maybe for comparing, but in my opinion it's not a main reason behind the identity.
Ok, let's assume that we retrieve and store OTHER aggregates from inside of our aggregate using injected repositories. What are other consequences beside SRP violation?
For sure there is a problem with having no control over persisting of aggregates and retrieving is some kind of lazy loading, which is bad for the same reason (no control).
Because of no control we can come into situation when we persist the same aggregate few times, where it could be persisted only once, or the same aggregate is loaded one hundred times where it could be loaded once, hence performance is worse. Also there might be problem with stale data.
These reasons practically disqualifies ability to inject repository into aggregate.
Here comes my main question - why can we inject repositories into domain service then?
Not the same reasons applies here? It's just like moving logic out of aggregate into separate function and pretend it to be something different.
To be honest, when I stared to write this SO question, I had no good answer for that. But after hours of investigating this problem and writing of this question I came to solution. Rubber duck debugging.
I'll post this question anyway for others having the same problems. Of course with my answer below.

Here are the places where I'd recommend to fetch aggregates (i.e. call Repository.Get...()), in preference order :
Application Service
Domain Service
Aggregate
We don't want Aggregates to fetch other Aggregates most of the time, because this blurs the lines, giving them orchestration powers which normally belong to the Application layer. You also raise the risk of the Aggregate trespassing its jurisdiction by modifying other Aggregates, which can result in contention and performance problems, not to mention that transactions become more difficult to analyze and the code base to reason about.
Domain Services are IMO a good place to fetch Aggregates when determining which aggregates to modify is domain logic per se. In your game example (which might not be the ideal context for DDD by the way), which units are affected by another unit's attack might be considered domain logic, thus you may not want to place it at the Application Service level. This rarely happens in my experience though.
Finally, Application Services are the default place where I call Repository.Get(...) for uniformity's sake and because this is the natural place to get a hold of the actors of the use case (usually only one Aggregate per transaction) and orchestrate calls to them.
That doesn't mean Aggregates should never be injected Repositories, there are exceptions, but other alternatives are almost always better.

So as I wrote in a question, I've found my answer already in the process of writing that question.
The best way to show this is by example:
When we have a simple (superficially) behavior like unit attacking other unit, we can write something like that.
unit.attack_unit(other_unit)
Problem is that, to attack an unit, we have to calculate damage and to do that we need another aggregates, like weapon and armor, which are referenced by id inside of unit. Since we cannot inject repository inside of aggregate, then we have to move that attack_unit logic into domain service, because we can inject repository there. Now where is the difference between injecting it into domain service, and not into unit aggregate.
Answer is - there is no difference. All consequences I described in question won't bite us. In both cases we will load both units once, attacking unit weapon once and armor of unit being attacked once. Also there won't be stale data, even if we mutate weapon object during process and store it, because that weapon is retrieved and stored in one place.
Problem shows up in different example.
Lets create an use case where unit can attack all other units in game in one process.
Problem lies in how we implement it. If we will use already defined unit.attack_unit and we will call it on all units in game (iterating over them), then weapon that is used to compute damage will be retrieved from unit aggregate, number of times equal to count of units in game! But it could be retrieved only once!
It doesn't matter if unit.attack_unit will be method of unit aggregate, or if it will be domain service unit_attack_unit. It will be still the same, weapon will be loaded too many times. To fix that we simply have to change implementation and with that probably interface too.
Now at least we have an answer to question "does moving logic from aggregate method to domain service (because we want to access repository there) fixes problem?". No, it does not change a thing.
Injecting repositories into domain service can be as dangerous as injecting it into aggregate if used wrong.
This answers my SO question, but we still don't have solution to real problem.
What can we do if we have two use cases: one where unit attacks one other unit, and second where unit attacks all other units, without duplicating domain logic.
One way is to put all needed aggregates as parameters to our aggregate method.
unit.attack_unit(unit, weapon, armor)
But what if we will need like five or more aggregates there? It's not a good way. Also application logic will have to know that all these aggregates are needed for an attack, which is knowledge leak. When attack_unit implementation will change we would also might to update interface of that method. What is the purpose of encapsulation then?
So, if we can't access repository to get needed aggregate, how can we smuggle it then?
We can get rid of idea with referencing aggregates by ids, or pass all needed aggregates from application layer (which means knowledge leak).
Or maybe reason of these problems is bad modelling?
Attacking of other unit is indeed an unit responsibility, but is damage calculation its responsibility? Of course not.
Maybe we need another object, like value object MeleeAttack(weapon, armor), yet when we add more properties that can change result of an attack, like enchantments on unit, it gets more complicated.
Also I think that we are now creating objects based on performance, not our on domain.
So from domain driven design, we get performance driven design. Is that what we want? I don't think so.

"So why would we need unique identity if not for persisting?" - think of an account scenario, where several John Smiths exist in your system. Imagine John Smith and John Smith Jr (who didn't enter the Jr in signup) both live at the same address. How do you tell them apart? Imagine I'm trying to write a recommendation engine based upon their past purchases . . . .
Identity is a quality of equality in DDD. If you don't have an identity unique from your fields, then you're a ValueObject.

What are consequences of using repository inside of aggregate vs inside of domain service?
There's a reasonably strong argument that you shouldn't do either.
Riddle: when does an aggregate need to see the state of another aggregate?
The responsibility of an aggregate is to control change. Any command that would change the state of the domain model is dispatched to the aggregate root responsible for the integrity of the state in question. By definition, all of the state required to ensure that the command is currently permitted is contained within the aggregate boundary.
So there is never any need to peek at the data outside of the aggregate when making a change to the model.
In which case, you don't ever need to load another aggregate, which makes the "where" question moot.
Two clarifications:
Queries will often combine the state of multiple aggregates, and will often need to follow a reference from one aggregate to another. The principle above is satisfied because queries treat the domain model as read-only. You need the state to answer the query, but you don't need the invariant enforcement because you aren't changing anything.
Another case is when you need state from another aggregate to process a command properly, but small latency in the data is an acceptable risk to the data. In that case, you query the "other" aggregate to get state. If you were to run that query within the domain model itself, the right way to do so would be via a domain service.
In most cases, though, you'll be equally well served to run the query when generating the command (ie, in the client), or when handling the command (in the application, outside the domain). It would be very unusual for a business to consider domain service latency to be acceptable but client latency to be unacceptable.
(Disconnected clients are one case where that can be especially problematic; when the command is generated and then queued for a long period of time before being dispatched to the server).

In DDD and CQRS, should I just put the required presentation logic directly into each Read (Finder) query?

I'm trying to decide the best place to take care of presentation logic. I've separated out my Read queries (CQRS) with each method querying and generating a DTO for my View. But my Views are simply templates with variables scattered about that will come from the DTO. They don't have any logic in them.
Say I want to do some things like reformatting how the date looks, and turning flags into actual descriptive words, or adding little conditions on what is displayed depending on what is queried from the database, and so on. I'm thinking to put this logic in with each query, and to not worry about being too DRY (I find that in some cases if you DRY too much then you could be making things hard to change in that you have to check each dependency or hope your unit tests hold up). I may use some "helpers" here and there to do formatting that I find I keep doing, but I don't see the need to add a whole other "presentation layer". So presentation logic would reside with each query and go into the returned DTO, to be dropped right into a View. This would keep the Read side of CQRS super thin, and makes sense in that each View corresponds to a Read query. But I'm also concerned in that some of this presentation logic would be very specific to the domain. A new developer coming on board would need to look at other queries and repeat the same formatting techniques, as opposed to just throwing the data out there straight from a raw query.
Is this the sound approach, or is there another approach used in DDD/CQRS? I'm having trouble finding any guidance from CQRS research I've done. Note: I happen to be using PHP/MySQL, but I imagine this question is language agnostic.

I think the most important part to understand about CQRS is that it doesn't have to be complicated. In fact, for the read side of things go for the simplest solution that will work and be maintainable. If all you need is a SELECT statement from a view to bind to a grid, why make a bunch of layers, DTO's, and web services? Is that adding any value to the business? However if there is a legitimate reason to add a layer to the equation then you may do so, and usually DTOs are a good way to communicate between those layers.
Your system may call for different query strategies depending on the use case at hand, so this doesn't have to be a one size fits all approach. Performance should always be one of your first concerns, so get the data as close to the consuming presentation code as possible and only add complexity when truly needed.
Some might say this is not loosely coupled if the presentation layer is reading directly from the database. However, just because you have many layers between 2 things, doesn't make them loosely coupled. In fact, it may be the same amount of coupling, but now you've added a maintenance headache since you have to touch 10 places every time a field is added.
Focus more on your command side, and do whatever feels practical for the read side.

Declarative Domain Model possible (DDD)?

I'm looking for insight/ papers/ articles, etc. whether a fully declarative Domain Model (as per DDD) is possible.
For example:
Validation can be declarative (lot's of ORMs do this)
business flow logic can be declarative: having a DSL for defining a workflow / Finite State Machine / process manager / DDD Saga (whatever you want to call it) on Crud-operations, through ddd-repositories most likely
decision logic can be declarative. I.e: most of the time this boils down to simple conditionals
derived / calculated fields could be done declaratively but is as bit tricky, especially when this cascades. I.e: you'd have to keep a dependency graph on the calculated fields, etc. Still it can be done.
Any links to people having actually tried that, or some convincing couter-arguments why this can't be done?
p.s.: Please don't answer with "Yes it can be done, since a FSM is Turing-complete with enough memory bla bla"

"Everything is a nail if you hold a hammer"
Instead of asking if it is possible - ask:
What are the reasons I want to do this particular thing declaratively?
Data validation is a nice thing to do declaratively. You have a DTO, you add some attibutes, everything is clear and readable.
Business flow done declaratively... Reminds me of a great failure of Windows Workflow Foundation. Is anyone actually using it?
What is the benefit of making a behaviour-centric components in a declarative way? Imperative way seems to be well-fitted to this.
Decision logic... maybe. But again - why?
"derived / calculated fields could be done declaratively but is as bit tricky"
Why do you want to do tricky things when there is a way of achieving the same result with straightforward things?
So why?
Do you need to change the behaviour of the app in runtime by editing some config file? OK, go for it.
Do you have a generic domain that will be used by many clients and you need some reconfiguration to fit them? Ok, but you need to clearly distinguish the unchangeable core of your domain and the varying stuff. Don't try to make all the things configurable - you'll end up with Inner Platform syndrome.
Do you believe you can make a declarative language your client could use to change his domain without a need for a programmer? No. You will fail. I know a language which was supposed to be like this. An easy, declarative language that ordinary accountants would use to explore their data. It's called SQL :D

I fully concur with Bartłomiej Szypelow's remark. Regardless, I will try to answer your question.
A declarative language always depends on an imperative language for it to work. Take something as simple as arithmetic for example. When we ask for the outcome of 1+1 we first need to know how the 1 should be understood and how in that context the + operator can be calculated before the statement as a whole can be interpreted. This is one of the ways we are able to overcome complexity; you don't need to know how a plus operation is done to be able to use one.
From http://en.wikipedia.org/wiki/Imperative_programming:
Procedural programming could be considered a step towards declarative
programming. A programmer can often tell, simply by looking at the
names, arguments and return types of procedures (and related
comments), what a particular procedure is supposed to do, without
necessarily looking at the details of how it achieves its result. At
the same time, a complete program is still imperative since it 'fixes'
the statements to be executed and their order of execution to a large
extent.
In any domain the same applies. If you ask the flight attendant to reschedule your flight she knows what a flight and a passenger is and how to reschedule one. Creating a fully declarative flight scheduling domain model without describing how scheduling works is therefore impossible. Since all domain models contain operations/behavior it is therefore equally impossible to create a domain model in a declarative language unless your problem domain is not unique, for example when you have three flight companies that operate in similar fashion.
Back to why... you said:
Actually, the hypothetical reason why is actually to allow non-coders
to create a (trivial) backend by configuration
Declarative programming is not always simpler than imperative programming. Most of the time people tend to think in outcomes, but sometimes the outcomes are so varied that it is (much) simpler to think in steps. This is the reason why these different styles exist and continue to exist in the first place. The main difference between a coder and a non-coder is not that a non-coder tends to think in terms of outcomes instead of processes, but that the non-coder simply does not want to be bothered with the workings of a computer. The non-coder wants to focus on the problem. Depending on the domain a non-coder may be helped best with an imperative DSL, instead of an declarative DSL.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string