Test Axon aggregate's behaviour with non-aggregate events - domain-driven-design

I'm trying to ensure an aggregate field is unique on creation. I'm trying to handle the eventually consistency of the projections by using the following flow:
Command creates a new aggregate
Aggregate issues creation event
Event handler attempts to materialize the projection and if there is a uniqueness violation, it emits a UniqueConstraintViolationEvent
Aggregate consumes this event and marks itself as deleted
I'm trying to use the AggregateTestFixture to test this, but it doesn't appear to allow you to preform any assertions without issuing a command (whenThenTimeElapses seems to allow assertions, but I get a NPE when asserting the aggregate has been deleted). Is there a way to use the test fixture for this case, or is it not designed to account for non-aggregate events. If not, is there another way to verify that the aggregate has been removed?
I'm not confident that this is actually the correct approach. I've also considered dispatching a command in (3) or using a saga to manage the interaction. If I dispatch a command instead of an event, it seems to simply force me to write more boilerplate to emit the UniqueConstraintViolationEvent from the aggregate. Likewise, if I use a saga to model this interaction, I'm unsure how to end the saga's lifecycle without having the projection materializer emit a success event that the saga consumes.

You are looking at a Set Based Validation scenario, which is a little more cumbersome when dealing with a CQRS system. This blog explains some of your options nicely I feel, thus I'd pick and choose from there.
As a form of guideline, note that the Aggregate guards it's own consistency boundary. Extending that boundary by asserting uniqueness among an entire set of Aggregate is thus not desirable. Simply put, not a single Aggregate instance's problem.
Hence, I would perform this uniqueness check prior to issuing the command, by consulting a lightweight query model containing the desired constraint.

Related

CQRS: Can the write model consume a read model?

When reading about CQRS it is often mentioned that the write model should not depend on any read model (assuming there is one write model and up to N read models). This makes a lot of sense, especially since read models usually only become eventually consistent with the write model. Also, we should be able to change or replace read models without breaking the write model.
However, read models might contain valuable information that is aggregated across many entities of the write model. These aggregations might even contain non-trivial business rules. One can easily imagine a business policy that evaluates a piece of information that a read model possesses, and in reaction to that changes one or many entities via the write model. But where should this policy be located/implemented? Isn't this critical business logic that tightly couples information coming from one particular read model with the write model?
When I want to implement said policy without coupling the write model to the read model, I can imagine the following strategy: Include a materialized view in the write model that gets updated synchronously whenever a relevant part of the involved entities changes (when using DDD, this could be done via domain events). However, this denormalizes the write model, and is effectively a special read model embedded in the write model itself.
I can imagine that DDD purists would say that such a policy should not exist, because it represents a business invariant/rule that encompasses multiple entities (a.k.a. aggregates). I could probably agree in theory, but in practice, I often encounter such requirements anyway.
Finally, my question is simply: How do you deal with requirements that change data in reaction to certain conditions whose evaluation requires a read model?
First, any write model which validates commands is a read model (because at some point validating a command requires a read), albeit one that is optimized for the purpose of validating commands. So I'm not sure where you're seeing that a write model shouldn't depend on a read model.
Second, a domain event is implicitly a command to the consumers of the event: "process/consider/incorporate this event", in which case a write model processor can subscribe to the events arising from a different write model: from the perspective of the subscribing write model, these are just commands.
Having read a lot about the topic and having thought hard about it myself, I attempt to answer my own question.
First, a clarification about the terms used. The write and read models themselves never have any dependency to one another. The corresponding command and query components might have instead. I will therefore call the entirety of the command component and its write model the command side, and the entirety of one particular query component and its read model a query side (of which there might be many).
So consider a command handler that is responsible for evaluating and executing a business policy. It takes a command DTO, validates it, loads part of the write model into memory, and applies changes to it in one atomic transaction. The question specifically was, whether this handler is allowed to query one of the query sides in order to inform its decision about what to do in the write model.
The answer would be a resounding NO. Here's why:
The command side would depend on one particular query side (it doesn't matter if you hide the dependency behind an interface – it is still there), so the query side cannot change independently.
Who actually guarantees that the command handler runs when it has to? The query side is certainly not the one responsible for it, and clients aren't either.
The command request is prolonged by performing a nested query request, which might be detrimental to the performance.
Instead, we can do the following:
Work with domain events raised by the write model, register a domain event handler in the command side that evaluates the policy. This way it is guaranteed that the policy will be executed whenever it has to be.
If the performance allows it, this domain event handler can simply load as much of the write model as it requires to evaluate the business condition. Don't prematurely optimize – maybe the entities are small and can easily be loaded into memory.
If the performance does not allow it, denormalize the write model and maintain the required statistics using domain events. No one says that the write model cannot itself contain query-oriented data. Being a write model simply says that it is a model designed to do writes, and this necessarily must include some means to read as well.
Finally, if applying the policy is not an integral part of the domain logic itself, but rather just a use case, consider putting the responsibility of calling it into a client or another microservice, where it is totally fine to first query one of our query sides, and afterwards calling our command side with the appropriate parameters.

Axon Framework: send command on aggregate load

We're building a microservices system with Axon Framework 4.1. In our domain, we have a label concept where we can attach labels to other entities. While labels are normally created and managed by the user, some of these labels are "special" and need to be hard-coded, but they need to be present in the event stream as well.
We have a bunch of aggregates that represent entities that can be labeled with these labels. Some of these aggregates will be used frequently, while others might be used infrequently or are even abandoned by the user.
Sometimes we come up with new special labels. We add them to the code, and then we also need to add them to the event stream. What is a good way to do that?
We can create a special command that we need to send when the updated service is started for the first time. It goes through all the labels and adds the ones that aren't in the event stream yet. This has two disadvantages. First, we need to actually send that command, which either requires us to not forget it, or to add some infrastructure for it outside of the code (e.g., in our build pipeline). Also, other services could have booted up faster with the new labels and started sending commands before we fired our special command. The other disadvantage is that this command will target all aggregates, including the abandoned ones, which could be wasteful of resources and be confusing to end users who might see activity in a document they thought was abandoned.
Ideally, we would like to be able to send the command when Axon has just loaded the aggregate. That way we would be certain that the labels are only introduced in aggregates that are actually used. Also, we could wire this up in code and it wouldn't require us to add infrastructure outside of the application and/or remember to do it manually.
Unfortunately, this feature doesn't seem to exist in Axon (yet) 😉.
Are there other (better) ways to achieve this?
I've got an idea which might help you out on this.
If I understand the use case correctly, the "Label" in your system, which user can introduce themselves but for which also a couple of hard-coded versions exist, is an Aggregate.
Based on that assumption, I suggest to be smart with the Aggregate Identifier you are using.
The sole thing that Axon expects from you, is that the Aggregate Identifier is (or can be made in to) a String. Typically a UUID is used for the Aggregate Identifiers, which is a reasonable first start.
You can however wrap this UUID in a typed-id object. Taking your "Label" Aggregate, that would opt for a LabelId.
That said, let's first go back to verifying whether a given "Label" Aggregate exists within the Event Stream.
The concern you have is rather valid I think; reading the entire Event Stream to figure out whether a given Aggregate instance exists is to big of a hassle.
However, the EventStore can be queried through two mechanism:
The Event Stream from a given point in time (e.g. what the TrackingToken mechanism does).
The Event Stream for a given Aggregate instance, based on the Aggregate Identifier.
It's the second option which is far more ideal in your scenario.
Just query the EventStore for a given "Label" Aggregate's Identifier. If you receive a non-empty Event Stream, you know it already exists.
Vice versa, if no Events are found, you are certain it's a new "Label" that needs to be introduced.
The crux here is in knowing the "Label's" Aggregate Identifier up front, which circles back to the String storage approach for the Aggregate Identifiers using a typed LabelId. What you could do, is deviate in the LabelId object between a custom "Label" (I'd opt for a UUID here) and a hard-coded "Label".
For the latter, you could for example have the label-name, plus a UUID/counter if desired.
Doing so will ensure that all the Events published from a hard-coded "Label" will have an Aggregate Identifier you can anticipate on during start-up.
Hope this is clear and all, if not, please comment on my response below.

Stream aggregate relationship in an event sourced system

So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates. For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on. Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store
The event store in a DDD project is designed around event-sourced Aggregates:
it provides the efficient loading of all events previously emitted by an Aggregate root instance (having a given, specified ID)
those events must be retrieved in the order they where emitted
it must not permit appending events at the same time for the same Aggregate root instance
all events emitted as result of a single command must be all appended atomically; this means that they should all succeed or all fail
The 4th point could be implemented using transactions but this is not a necessity. In fact, for scalability reasons, if you can then you should choose a persistence that provides you atomicity without the use of transactions. For example, you could store the events in a MongoDB document, as MongoDB guaranties document-level atomicity.
The 3rd point can be implemented using optimistic locking, using a version column with an unique index per (version x AggregateType x AggregateId).
At the same time, there is a DDD rule regarding the Aggregates: don't mutate more than one Aggregate per transaction. This rule helps you A LOT to design a scalable system. Break it if you don't need one.
So, the solution to all these requirements is something that is called an Event-stream, that contains all the previous emitted events by an Aggregate instance.
So I would have an Inventory aggregate
The DDD has higher precedence than the Event-store. So, if you have some business rules that force you to decide that you must have a (big) Inventory aggregate, then yes, it would load ALL the previous events generated by itself. Then the InventoryItem would be a nested entity that cannot emit events by itself.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Yes, indeed. The simplest thing would be for us to all have a single Aggregate, with a single instance. Then the consistency would be the strongest possible. But this is not efficient so you need to better think about the real business requirements.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location?
There is another possibility. You should model the assigning of product codes as a Business Process. For this you could use a Saga/Process manager that would orchestrate the entire process. This Saga could use a collection with an unique index added to the product code column in order to ensure that only one product uses a given product code.
You could design the Saga to permit the allocation of an already-taken code to a product and to compensate later or to reject the invalid allocation in the first place.
It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
The Saga uses indeed a private state maintained from the domain events in an eventual consistent state, just like a Read-model but this does not feel wrong for me. It may use whatever it needs in order to bring (eventually) the system as a hole to a consistent state. It complements the Aggregates, whose purpose is to not allow the building-blocks of the system to get into an invalid state.

Check command for validity with data from other aggregate

I am currently working on my first bigger DDD application. For now it works pretty well, but we are stuck with an issue since the early days that I cannot stop thinking about:
In some of our aggreagtes we keep references to another aggregate-root that is pretty essential for the whole application (based on their IDs, so there are no hard references - also the deletion is based on events/eventual consistency). Now when we create a new Entity "Entity1" we send a new CreateEntity1Command that contains the ID of the referenced aggregate-root.
Now how can I check if this referenced ID is a valid one? Right now we check it by reading from the other aggregate (without modifying anything there) - but this approach somehow feels dirty. I would like to just "trust" the commands, because the ID cannot be entered manually but must be selected. The problem is, that our application is a web-application and it is not really safe to trust the user input you get there (even though it is not accessibly by the public).
Did I overlook any possible solutions for this problems or should I just ignore the feeling that there needs to be a better solution?
Verifying that another referenced Aggregate exists is not the responsibility of an Aggregate. It would break the Single responsibility principle. When the CreateEntity1Command arrive at the Aggregate, it should be considered that the other referenced Aggregate is in a valid state, i.e. it exists.
Being outside the Aggregate's boundary, this checking is eventually consistent. This means that, even if it initially passes, it could become invalid after that (i.e. it is deleted, unpublished or any other invalid domain state). You need to ensure that:
the command is rejected, if the referenced Aggregate does not yet exists. You do this checking in the Application service that is responsible for the Use case, before dispatching the command to the Aggregate, using a Domain service.
if the referenced Aggregate enters an invalid state afterwards, the corrects actions are taken. You should do this inside a Saga/Process manager. If CQRS is used, you subscribe to the relevant events; if not, you use a cron. What is the correct action it depends on your domain but the main idea is that it should be modeled as a process.
So, long story short, the responsibilty of an Aggregate does not extend beyond its consistency boundary.
P.S. resist the temptation to inject services (Domain or not) into Aggregates (throught constructor or method arguments).
Direct Aggregate-to-Aggregate interaction is an anti-pattern in DDD. An aggregate A should not directly send a command or query to an aggregate B. Aggregates are strict consistency boundaries.
I can think of 2 solutions to your problem: Let's say you have 2 aggregate roots (AR) - A and B. Each AR has got a bunch of command handlers where each command raises 1 or more events. Your command handler in A depends on some data in B.
You can subscribe to the events raised by B and maintain the state of B in A. You can subscribe only to the events which dictate the validity.
You can have a completely independent service (S) coordinating between A and B. Instead of directly sending your request to A, send your request to S which would be responsible for a query from B (to check for validity of referenced ID) and then forward request to A. This is sometimes called a Process Manager (PM).
For Example in your case when you are creating a new Entity "Entity1", send this request to a PM whose job would be to validate if the data in your request is valid and then route your request to the aggregate responsible for creating "Entity1". Send a new CreateEntity1Command that contains the ID of the referenced aggregate-root to this PM which uses ID of the referenced AR to make sure it's valid and if it's valid then only it would pass your request forward.
Useful Links: http://microservices.io/patterns/data/saga.html
Did I overlook any possible solutions for this problems
You did. "Domain Services" give you a possible loop hole to play in.
Aggregates are consistency boundaries; their behaviors are constrained by
The current state of the aggregate
The arguments that they are passed.
If an aggregate needs to interact with something outside of its boundary, then you pass to the aggregate root a domain service to encapsulate that interaction. The aggregate, at its own discretion, can invoke methods provided by the domain service to achieve work.
Often, the domain service is just a wrapper around an application or infrastructure service. For instance, if the aggregate needed to know if some external data were available, then you could pass in a domain service that would support that query, checking against some cache of data.
But - here's the trick: you need to stay aware of the fact that data from outside of the aggregate boundary is necessarily stale. There might be another process changing the data even as you are querying a stale copy.
The problem is, that our application is a web-application and it is not really safe to trust the user input you get there (even though it is not accessibly by the public).
That's true, but it's not typically a domain problem. For instance, we might specify that an endpoint in our API requires a JSON representation of some command message -- but that doesn't mean that the domain model is responsible for taking a raw byte array and creating a DOM for it. The application layer would have that responsibility; the aggregate's responsibility is the domain concerns.
It can take some careful thinking to distinguish where the boundary between the different concerns is. Is this sequence of bytes a valid identifier for an aggregate? is clearly an application concerns. Is the other aggregate in a state that permits some behavior? is clearly a domain concern. Does the aggregate exist at all...? could go either way.

Read model for aggregate in DDD CQRS ES

In CQRS + ES and DDD, is it a good thing to have small read model in aggregate to get data from other aggregate or bounded context?
For example, in order validation (In Order aggregate), there is a business rules which validate order only if customer is not flagged. The flag information is put in read model (specific to the aggregate) via synchronous domain events.
What do you think about this ?
is it a good thing to have small read model in aggregate to get data from other aggregate or bounded context?
It's not ideal. Aggregates, due to their nature, are not good at enforcing consistency that involves state outside of themselves.
What this usually means is that the business is going to need some way to respond when two aggregates produce an unacceptable state.
You also have the option of checking for the flag before you run the placeOrder command on the aggregate. That check for the flag could be done in the command handler, or in the client -- basically, you have was of "validating" that the command should succeed before passing it to the aggregate.
That said, if it were critical to try to consult the read model while processing the command, a way to do it would be to use a "domain service"; you pass a service provider to the aggregate as part of the command, and let the interface abstract away the fact that running the query requires looking outside of the aggregate.
That gives you some of the decoupling you need to keep the aggregate testable.
It's doable, but not in the form of a read model, rather a Value Object in the Aggregate (since we're on the Write side).
If you already have a CustomerId in Order, you just have to compose a VO with it and a Flagged member.
Of course, this remains prone to all the problems of cross-aggregate communication since the data originates from Customer. Order has to be kept in sync with the flagged status of its Customer, which can require quite a bit of work.
In any case, you should probably first determine with your domain expert whether immediate consistency is an absolute requirement (in which case you have to somehow wrap Customer + Order in a transaction) or if you can afford a small delay in Flagged freshness when enforcing that invariant.
If the latter, you can choose between duplicating Flagged in the Order aggregate or the first option given by #VoiceOfUnreason - the main difference being probably that if the data is in the aggregate, you'll get it for free at the Domain level should you need it in multiple occasions, instead of duplicating the check in multiple use cases/command handlers at the application level.

Resources