What are aggregates and how are they used in CQRS (Command-Query-Responsibility-Segregation) and ES (Event-Sourcing)? I'm new to this kind of architecture, and I'd be really happy if someone could please explain this to me. Thanks!
First I'd like to quote Martin Fowler's blog post on CQRS and note that Aggregates are rather related to Domain Driven Design then to CQRS.
CQRS naturally fits with some other architectural patterns.
As we move away from a single representation that we interact with via CRUD,
we can easily move to a task-based UI.
Interacting with the command-model naturally falls into commands or events,
which meshes well with Event Sourcing.
Having separate models raises questions about how hard to keep those models
consistent, which raises the likelihood of using eventual consistency.
For many domains, much of the logic is needed when you're updating,
so it may make sense to use EagerReadDerivation to simplify
your query-side models.
CQRS is suited to complex domains, the kind that also benefit from
Domain-Driven Design.
In terms of Domain-Driven Design Aggregate is a logical group of Entities and Value Objects that are treated as a single unit (OOP, Composition). Aggregate Root is a single one Entity that all others are bound to.
Related
My question here is quite straight as mentioned in the subject.
However, please allow me to give some brief explanation here about my innocent thoughts.
I've been using Axon for approximately 10 months now. I used to design my project structure based on the Hexagonal architecture with two top level packages respectively for domain and infrastructure.
Furthermore, domain package will contain different domain objects (as explained in the DDD concept) such as follow:
Aggregate (this will be an Axon aggregate class).
Repository (in my case, this will be a Spring Data Repository interface).
Entity (in my case, this contains any lookup entity that i used for set-based consistency validation as written here).
Service Port (collection of Input and Ouput port interfaces).
Commands (representing Axon Command object).
As for Events, I used to put them on a different module that I compiled as a jar file, so I can share it to other developers whom going to use the same event in their project.
I've noticed recently that all of my commands and events were basically anemic models (an anti pattern that we should avoid).
Is there any good practice on this ? Or, Is it something that intentionally used by design ?
I've been thinking to put my Command classes within my Aggregate class (as an inner classes). At least by using this approach I won't end-up with having so many anemic models scattered outside. Any thoughts ?
Commands are designed to be behavior and input structures mirroring the external world. They don't necessarily mirror an aggregate's structure.
They are not even connected clearly to one single aggregate, at times. Enclosing them within aggregates can be a code smell because you are then thinking in terms of resources and UI organization, instead of transaction boundaries and entity groups.
You are also violating the open-closed principle. Changes in volatile layers like user interface and request structures will make you edit the Aggregate class, and that is not good design.
On a more general note...
At times, this debate of anemic vs. non-anemic (or dry vs. non-dry) can push you in the direction of premature - and incorrect - optimization. Try avoiding this trap because you will end up optimising at the code level, but your domain will suffer.
DDD and CQRS guidelines align with principles that help you keep complexity at bay over the long term. Things kept distinct and separate help you achieve this.
First of all, in DDD, your domain had to be free of any frameworks, just use pure language library.
Then, mixing Commands and Aggregates cannot be a good solution. I think Commands belongs to Port while Aggregates belongs to the Hexagone.
Finally, DDD highlights the discovery of the domain thanks to the experts. Did you do that ? If not, if you're only using the Tacticts pattern, you'll miss one of the most important part of DDD.
I'm struggling with how/if to define "a set of aggregates". Aggregates are supposed to be stand alone and isolated but it's easy to think of a bigger set of aggregates that belong together. But is this a trap?
Using this "set of aggregates" it would be possible to for instance enumerate and index aggregates on a unique property within the set and have other domain rules that could be validated across all aggregates in the set. It's tempting but also feels a bit wrong.
Another approach would be to avoid this thinking completely and not allow/define a set of aggregates and not allow enumerating aggregates but only load/save on aggregate-id. Using this option if would be necessary to reference aggregates from other aggregates and by doing this build up an interconnected graph of aggregates.
The approaches are similar to having aggregates in a folder on disk or having an "internet" of aggregates where the references between them are defining the bigger set of aggregates. In any case I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much? I'm not sure I explain this very good but my question is if there are any definitions of the "set of aggregates" or if we should think of aggregates as totally isolated/on its own and with only a unique aggregate-id (UUID)?
The set of aggregates could for instance be the database being used under the surface. But what I'm wondering is if this database as in the information about what aggregates it contains has any definition in DDD or if we should think about a set of aggregates as an interconnected graph where only traversal of this graph can be used to enumerate all "associated" aggregates.
Aggregates are connected
In any application with sufficient complexity, Aggregates end up referencing one-another. And it is perfectly reasonable to use their unique identifiers as reference IDs to refer to each other.
But take care to load and persist aggregates outside the domain layer, typically in repositories. If you want to traverse links across aggregates and load them into memory, you will be doing that upfront before handing over control to the domain layer for the actual processing.
Traversing the graph to get all related aggregates is correct, but this rarely spans across too many aggregate boundaries. You rarely find a single change or rule to be applied throughout the application. If you do have such a transaction, it is probably a sign of the domain design needing improvement, simply because you are spreading one responsibility/change amongst many aggregates.
The connectivity is so usual that you should watch out for aggregates that have no linkages with the rest of the system. They are either standalone libraries, or they probably belong to a different bounded context.
Aggregates can morph into different forms
They are aggregates because they form a clear invariant boundary, with their primary responsibility being to enforce invariants across state changes for all the entities within themselves. But they can morph into different kinds of DDD objects based on the requirement.
A good example is of a single Currency note. In most applications, they are value objects. But for the federal bank, they are aggregates with clear cut invariant rules. They are aggregates when they are created and referenced, but in a transaction that ships printed notes to banks, they may become value objects.
So you may have to evaluate whether you are talking about a domain entity in its aggregate form, or as a value object when you consider each linkage.
Aggregates are invariant boundaries
It is wrong to validate domain rules across aggregates.
Your aggregate boundary is an invariant boundary, meaning all the domain rules within it should be satisfied at all time. By that logic, you are going to incorrectly build up a structure that will need to ensure that all domain rules across aggregates are valid at all time. Doing so will impose considerable performance burden, not to mention the complexity in business logic.
But this is not to say that there may be domain rules that span across aggregates. The correct way to accomplish this would be using eventual consistency and an Event-driven approach.
The primary changing aggregate would validate and persist the data, and bubble up an event containing the state change. Other aggregates would then act on the event and bring themselves up-to-date. If an aggregate's domain rules break because of the change, there is usually a supplementary mechanism that allows correction of the problem (a preferred mechanism) or a rollback of the first state change (happens very rarely).
Perhaps you can find a common denominator the agg sets have in common and use that to work with?
A simplified example; there is a set of Books and a set of Users that have nothing in common except you want to know whenever they were first registered? What might be an option is to have an interface FirstRegistration and then you can choose to either expand Books/Users or create a specific entity instead.
I'm struggling with how/if to define "a set of aggregates". Aggregates
are supposed to be stand alone and isolated but it's easy to think of
a bigger set of aggregates that belong together. But is this a trap?
I think you're struggling because indeed the idea of a set of aggregates (instances) is very generic, and the uses of such things are contextual and domain-specific. People don't talk specifically about it because of course you may have behaviors that operate on a collection of multiple aggregates, but that doesn't give such collections any particular common properties or requirements that would allow you, from a general DDD perspective, to characterize such collections more specifically than "a set of aggregates", "a list of distinct aggregates", or similar.
Using this "set of aggregates" it would be possible to for instance
enumerate and index aggregates on a unique property within the set and
have other domain rules that could be validated across all aggregates
in the set. It's tempting but also feels a bit wrong.
Tempting why? You've couched the question in very abstract terms, so it's pretty much impossible to contradict you about the "it would be possible", but just because something may be possible doesn't mean it would be useful. In practice, I think you'll find that rules or behaviors that operate on collections of aggregates most naturally belong not to collections of aggregates in an abstract sense, but rather to other aggregate types in your domain model, to domain repositories, or to domain services.
It is entirely plausible that your domain model might want to handle particular sets of aggregates characterized by some rule. For example, if you're an airline, then one of the aggregates in your domain model might a single seat on a flight, since that's the unit you sell. It makes sense in that case that there would be operations on all the seats on a particular flight, for example, but whatever rules and behaviors you might have about that are specifically about that kind of aggregate, selected in that particular way.
Another approach would be to avoid this thinking completely and not
allow/define a set of aggregates and not allow enumerating aggregates
but only load/save on aggregate-id.
It's surely counterproductive to forbid working with sets of aggregates. Just don't attribute more significance to it than is warranted. There is nothing particularly special about sets of aggregates in general.
Using this option if would be
necessary to reference aggregates from other aggregates and by doing
this build up an interconnected graph of aggregates.
I don't follow that. One certainly must be able to retrieve and store individual aggregates from persistence, as that's more or less the defining property of aggregates -- they are the unit of persistence. But that doesn't mean that you must reject the ability to work with collections of aggregates. However, sets of aggregates do not have identity in the same way that individual aggregates do, so yes, relationships between aggregates need to be modeled in terms of individual aggregates. Nevertheless, that does not inherently preclude 1:m or n:m relationships among aggregates.
I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much?
You'll find all sorts of uses of various sets of aggregates in applications built and maintained based on DDD ideas, but there's not much to talk about at the level of abstraction of your question, and what there is is already summed up in the words "set" and "aggregate".
The set of aggregates could for instance be the database being used
under the surface. But what I'm wondering is if this database as in
the information about what aggregates it contains has any definition
in DDD
Not to my knowledge. I suspect most DDD practitioners would just call it "the data", or something similar.
or if we should think about a set of aggregates as an
interconnected graph where only traversal of this graph can be used to
enumerate all "associated" aggregates.
I'm still not seeing why you set that up as a thing. Sure, depending on the domain model, you might be able to traverse all or substantial chunks of the data by traversing associations between aggregates, and that might be appropriate for some purposes, but DDD doesn't have to give a special name or special rules for sets of aggregates for you to work with them.
Like any useful methodology, DDD exists to solve problems. Its bread & butter is complex applications with complex data and evolving requirements. It is not to be interpreted as a straight jacket preventing designers and developers from (thoughtfully) writing designs and code that incorporate aspects of other design approaches, much less designs and code that provide for the application's idiosyncratic needs.
I have seen ORM use a unit of work to commit multiple repositories in a single step.
I have also seen DDD and the use of aggregate roots saved via repositories, when using event stores persistence conceptually becomes quite clear to understand.
I always need to write data access code and whilst I am familiar with ORM, I am new to domain driven design and event sourcing - event sourcing is great, but does come with a lot of infrastructure.
Ultimately I would like to some rules to help decide at what point (code size, number of database entities) when DDD+ES becomes worth the extra effort over CRUD systems.
To help decide my questions are as follows:
I haven't seen aggregate roots combined in to a single unit of work, is this avoided? If so what problems can this cause?
In DDD a customer entity may have addresses and phones embedded within it (value objects), whereas in ORM there is a unit of work with customer, phone and address repositories. What is the best way to explain and understand these different approaches?
Can ORM use multiple different unit of works (each referencing relevant and related repositories/tables) to represent an aggregate root?
What are the pain/warning signs to look out for with impedance mismatch from my domain to ORM, at which point we may consider switching to an event store?
An aggregate defines a consistency boundary. In NoSQL databases, it is usually not possible to commit multiple entities per transaction. Therefore, in DDD with NoSQL, it is desirable to only have a single aggregate in a unit of work while updates to entities external to the aggregate at hand are delivered in an eventually consistent manner.
If addresses and phones are value objects then they shouldn't have repositories. In the ORM, they would be mapped as components of a parent entity not a separate mapping.
I'm not sure what you'd achieve this way?
One pain point that naturally leads to event sourcing is the need to preserve all state changes in an aggregate. Furthermore, event sourcing and the concept of domain events in general provide a different domain modelling methodology focused on behavior rather than state. I'd consider ES when there is potential business value in preserving all state changes. If you are willing to make the initial infrastructure investment, ES can in many ways be simpler by avoiding ORM madness. Think of CRUD as event sourcing with only 4 event types, or even 2 (read, update). Beyond the most basic domains, it is desirable to have more context beyond changes to data which leads you to ES.
One common reaction that I see for a lot of questions asked here and other forums are like "You don't need to do DDD for that. Its a simple CRUD application, DDD is an over-engineering".
Well I am new to DDD and I feel there are a lot of elements in DDD that has universal appeal and can be used across the board, irrespective of the fact whether your application is complex enuf to mandate DDD. For example, layering of application, differnent artifacts that DDD recognises etc. May be start with the basics and admittedly anemic models and then work/refactor towards as much purity as one can get.
Does this approach sound good?
Or would you say that there is a fundamental choice in design of every application in terms of whether to go DDD way or not , kind of "all-or-none" choice?
UPDATE (to provide more context, in response to hugh's comment below)
I am building a webapp around an existing RuleEngine kind of application, basically CRUD and some validations, invariants and then a process of deployment. The rule-authoring and semantic check is done by a standalone piece of code that i call as part of the CRUD and none of that semantic specific logic is there in my code. I am trying to use DDD for this application, but i see it might not be complicated enough to fit into the DDD paradigm. There is no ubiquitous language defined for the domain i.e the language is not specialized enough beyond naming the set of entities involved. I hear my domain expert speaking in terms of creating, editing, deleting entities.
DDD is not all-or-nothing. Also, many of the patterns described in DDD are not new and can be found all over the place. Eric Evans (the author of the DDD book) just assembled them, formalized them where needed, and set them in relation to each other. You are free to use what fits to your problem space.
What is often overlooked: DDD describes implementation patterns as well as analysis patterns. The analysis patterns may be overkill in many (if not most) applications, but the implementation patterns (i.e. Entities, Specifications, Services) can be of great use in less complex scenarios as well.
In short,
If it's just CRUD, I wouldn't bother.
On the other hand,
If it's got behavior, where the next state of something relies on the previous state, then DDD is something you probably want to consider.
DDD approach is based on two type of patterns: Strategics patterns and Tactical patterns. IMHO, be free to use tactical patterns in every case that you sure can be helpful. But using Strategic patterns can be overkill for some type of domains. If CRUD style thinking, does not have negative impact in modeling process, certainly you doesn't gonna need it.
I am quite new with DDD and would like to know about any pitfalls you might want to share. I will summarize it later for more newbies to read :)
Thanks
Summary so far:
Anemic domain model where your entities are primarily only data bearing and contain no business logic
Not using bounded contexts enough
Focusing too much on patterns
There is a good presentation on this topic as well here (video).
Probably the most important one: not grokking the central, fundamental principle of the Domain Model and its representation in Ubiquitous Language. With the plethora of technology options around, it's very easy for your head to fill up with ORMs, MVC frameworks, ajax, sql vs nosql, ... So much so there's little space left for the actual problem you're trying to solve.
And that's DDD's key message: don't. Instead, explicitly focus on the problem space first and foremost. Build a domain model shorn of architectural clutter that captures, exposes and communicates the domain.
Oh, and another one: thinking you need Domain Services for everything you can do in the domain model. No. You should always first try to put domain logic with the Entity/Value type it belongs to. You should only create domain services when you find functions that don't naturally belong with an E/V. Otherwise you end up with the anaemic domain model highlighted elsewhere.
hth.
One of the biggest pitfalls is that you end up with a so-called anemic model where your entities are primarily only data bearing and contain no business logic. This situation often arises when you build your domain model on top of an existing relational data model and just make each table in the database an entity in your domain model.
You might enjoy presentation of Greg Young about why DDD fails.
In short:
Lack of intent
Anemic Domain Model
DDD-Lite
Lack of isolation
Ubiquitous what?
Lack of refinement
Proxy Domain Expert (Business analyst)
Not using bounded contexts enough. It's toward the back of the the big blue book but Eric Evans has gone on record as saying that he believes that bounded contexts and ubiquitous language are THE most important concepts.
Similarly, people tend to focus too much on the patterns. Those aren't the meat of DDD.
Also, if you do not have a lot of access to domain experts you are probably not doing DDD, at best you are DDDish.
More concretely, if you end up with many-to-many relationships, you've probably designed something wrong and need to re-evaluate your aggregate roots/contexts
Only adding to what others have already said;
My personal experience is that people often end up with an anemic model and a single model instead of multiple context specific models.
Another problem is that many focus more on the infrastructure and patterns used in DDD.
Just because you have entities and repositoriesand are using (n)Hibernate it doesn't mean you are doing DDD.
It's not from my personal experience with subject, but it was mentioned for a couple of times in DDD books and it's what I've been thinking about recently: use Entities when you really need identity, in other cases use Value Object. I.e., Entity pattern often happens to be the default choice for any model noun, and it's not the way it should be.
Beware of the Big Ball of Mud.
One of the pitfalls of domain driven design is to introduce ambiguity into a model. As explained in the article Strategic Domain Driven Design with Context Mapping:
Ambiguity is the super-villain of our
Ubiquitous Language
This may happen when two distinct concepts share the same name, or when the same concept can have different uses. It may be necessary to
expose the domain structure in
terms of bounded contexts in a context
map
If a model is used in too many different ways, or has too many responsibilities, it may be a sign that it should be divided.