I'm doing DDD analysis using event-storming, and run into this question:
Can we or should we define distinct events (e.g.: RouteCreatedByUser and RouteCreatedFromImport) based on how it was created?
Will "it depends" be the answer again? (e.g.: depends, if the way it was created will affect subsequent process / how the aggregate will be treated).
Or will the answer be flat out "no", just make one event (RouteCreated)?
You should always consider if your events have distinict business
significance. If they do, they should be separate. This allows them to
evolve independently.
Also, see how many things are common and how much is different. If you find some things which are different but not really generalisable, you should split the events. No use in having unused fields with lots of conditions when something is applicable and when it isn't.
If you fail to catch this distinction you would end up with low cohesion, lots of unused fields etc. Remember, that even sourcing is primarily about having an immutable event log. What it means is while you can modify the events if you need to do some simple schema migrations (in general, you shouldn't but sometimes it just makes more sense), you can never put any events in between or delete events.
Related
I'm aware of the general rule that only a single aggregate should be modified per transaction, mostly for concurrency and transactional consistency issues, as far as I'm aware.
I have a use case where I want to create multiple aggregates in a single transaction: a RestaurantManager, a Restaurant, and a Menu. They seem like a single aggregate because their life-cycles begin and end together: it doesn't make sense within the domain to create a RestaurantManager without a Restaurant, or vice versa; the same goes for a Restaurant and a Menu. Further, if the Restaurant or the RestaurantManager is deleted (unregistered), they should all be deleted together.
However, I've split them into separate aggregates because, once created, they are updated separately, maintain their own invariants, and I don't want to load them all into memory just to update one property on the Restaurant, for example.
The only thing that ties them together is their life-cycle.
My question is whether this represents a case where it is okay to go against the "rule" that each transaction should only operate on a single aggregate.
I'd also like to know if I should enforce their shared life-cycle in the domain model by having each aggregate root hold the identifier of the aggregate root it depends on, i.e. by having Restaurant require a MenuId as a constructor parameter, and likewise for Menu and RestaurantId, so that neither can be created without the other. However, this still wouldn't enforce that they should be saved together by the application service anyway, since it could create them all in memory, then only save the Menu, for example.
Your requirement is a pretty normal use case in DDD, IMHO. There are always multiple aggregates working in tandem to support the application, and they are interlinked in their lifecycles. But the modeling concepts still stand true. Let me attempt to explain what your model would look like with the help of a few DDD rules:
Aggregates are transaction boundaries
Aggregates ensure that no business invariants are broken at any point. This means that if you have multiple aggregates strung together as part of one transaction, you have to load all of them into memory for the validation.
This is especially a problem when your application is data-rich and stores data in a database cluster - partitioned, distributed (think Mongo or Elasticsearch). You will have the problem of loaded up data from potentially different clusters as part of a single transaction.
Aggregates are loaded in entirety
Aggregates and their associated data objects are loaded in entirety into memory. This means that unnecessary objects (say the restaurant's schedule for the upcoming month, for example) for the transaction may be loaded into memory. By itself, this is not a problem. But when multiple aggregates get together, the amount of data loaded into memory needs to be considered.
Aggregates refer to each other by their unique identifiers
This one is straightforward and means that each aggregate stores its referenced aggregates by their identifiers instead of enclosing the other aggregate's data within it.
State changes across Aggregates are handled through Domain Events
In cases where you want a state change in one aggregate to have side-effects on other aggregates, you publish a domain event, and a subscriber handles the change on other aggregates in the background. This is how you would want to handle your requirement for cascade deletes.
By following these rules, you are essentially zooming in one single aggregate at a time and ensuring that the complexity remains low. When you string up multiple aggregates, though it is clear and understandable on day 1, eventually, the application tends towards becoming a big ball of mud, as dependencies and invariants start crisscrossing each other.
"only a single aggregate should be modified per transaction"
Contention at creation doesn't matter as much. You can create many ARs in a single transaction without problem because the only other operation that could conflict is another duplicate creation process.
Another reason to avoid involving many ARs in a single transaction is coupling between modules though, but you could always keep things loosely coupled using synchronously dispatched domain events.
As for the deletion, it's probably less problematic to make it eventually consistent. Does it really matter that Restaurant is closed while RestaurantManager remains registered for a short period of time?
The fact you are asking this question tells me your system is not distributed? If your system is running with a single DB server and used by a few people it may be that eventual consistency make things more complex for scalability you don't actually need.
Start simple and refactor as needed, but crossing AR boundaries is not something that should be done consistently or else your boundaries are clearly wrong.
Furthermore, if you want to communicate that a RestaurantManager can't be spawned from nowhere and associated with an invalid RestaurantId by mistake you may want to look at your ubiquitous language for guidance.
e.g.
"A RestaurantManager is registered for a given Restaurant": not sure it truly aligns with your UL, but it's just for the sake of the example.
RestaurantManager manager = restaurant.registerManager(...);
This obviously increases coupling and could affect performance, but it aligns well with the UL and makes it more difficult to misuse the model. Also note that with a single DB, you could enforce referential integrity which takes cares of these uninteresting referential constraints.
As pointed out by #plalx, contention doesn't matter as much when creating aggregates in terms of transactions, since they don't yet exist so can't be involved in contention.
As for enforcing the mutual life cycle of multiple aggregates in the domain, I've come to think that this is the responsibility of the application layer (i.e. an application service, or use case).
Maybe my thinking is closer to Clean or Hexagonal architecture, but I don't think it's possible or even sensible to try and push every single business rule down into the "domain model". The point of the domain model for me is to partition the problem domain into small chunks (aggregates), which encapsulate common business data/operations that change together, but it's the application layer's responsibility to use these aggregates properly in order to achieve the business' end goal (which is the application as a whole), including mediating operations between the aggregates and controlling their life cycles.
As such, I think this stuff belongs in an application service. That being said, frequently updating multiple aggregates in each use case could be a sign of incorrect domain boundaries.
I`m reading the book PATTERNS, PRINCIPLES, AND PRACTICES OF DOMAIN-DRIVEN DESIGN, written by Scott Millett with Nike Tune. In the chapter 19, Aggregates, he states:
Sometimes it is actually good practice to modify multiple aggregates within a transaction. But it’s
important to understand why the guidelines exist in the first place so that you can be aware of the
consequences of ignoring them.
When the cost of eventual consistency is too high, it’s acceptable to consider modifying two objects in the same transaction. Exceptional circumstances will usually be when the business tells you that the customer experience will be too unsatisfactory.
To summarize, saving one aggregate per transaction is the default approach. But you should
collaborate with the business, assess the technical complexity of each use case, and consciously ignore
the guideline if there is a worthwhile advantage, such as a better user experience.
I face to a case in my project when user request a operation to my app and this operation affects two aggregate, and there are rules that must be verified by the two aggregates for the operation takes place successfully.
it is something like "Allocating a cell for a detainee":
the user makes the request
the Detainee (AR1) is fetched from database and receives a command: detainee.AllocateTo(cellId);
3 the Cell (AR2) is fetched and receive a command: cell.Allocate(detaineeId);
Both steps 2 and 3 could throw an exception, depending on the detainee's status and cell capacity. But abstract it.
Using eventual consistency, if step 2 is executed successfully, emiting the event DetaineeAllocated, but step 3 fails (will run in another transaction, inside an event handler), the state of aggregates will be inconsistent, and worse, the operation seemed to be executed successfully for the user.
I know that there are cases like "when the user makes a purchase over $ 100, its type must be changed to VIP" that can be implemented using eventual consistency, but the case I mentioned does not seem to be one.
Do you think that this is a special case that the book mentions?
Each aggregate must not have an invalid state (internal state), but that does not imply aggregates have to be consistent with one another (external, or system state).
Given the context of your question, the answer could be either yes or no.
The Case for No
The external state can become eventually consistent, which may be acceptable to your product owner. In this case you design ways to detect the inconsistency and deal with it (e.g. by retrying operations, issuing compensating transactions, etc.)
The Case for Yes
In your orchestration layer, go ahead and update the aggregates in a transaction. You might choose to do this because it's "easy" and "right", or you might choose to do this because your product owner says the inconsistency can't be tolerated for whatever reason.
Another Case for No
There's another way out for saying this is not a special case, not a reason for more than one transaction. That way out requires a change to your model. Consider removing the mutual dependency between your detainee and the cell, and instead introducing another aggregate, CellAssignment, which represents a moment-interval (a temporal relationship) that can be constructed and saved in a single transaction. In this case, your detainee and the cell don't change.
"the state of aggregates will be inconsistent"
Well, it shouldn't be inconsistent forever or that wouldn't be eventual consistency. You would normally discuss with business experts to establish an acceptable consistency timeframe.
Should something go wrong an event will be raised which should trigger compensating actions and perhaps a notification to a human stating something went wrong after-all.
Another approach could be to introduce a process manager which is responsible to carry out the business process by triggering commands and listening to events, until completion or timeout. The ARs are often designed to allow small incremental steps
towards consistency. For instance, there could be a command to reserve cell space first rather than directly allocating the detainee. The UI could always poll the state of the process to know when it's complete if necessary.
Eventual consistency obviously comes at a cost. If you have a single DB in a monolith that doesn't need extreme scalability you could very well favor to modify both ARs in a single transaction until that becomes a problem.
Eventual consistency is often sold as less costly that strong consistency, but I believe that's mostly for distributed systems where you'd have to deal with XA transactions.
Do you think that this is a special case that the book mentions?
No.
What I suspect you have here is a modeling error.
From your description, it sounds like you are dealing with something like a CellAssignment, and the invariant that you are trying to maintain is to ensure that there are no conflicts among active cell assignments.
That suggests to me that you are missing some sort of aggregate - something like a seating chart? - that keeps track of all of the active assignments and conflicts.
How can you know? One way is to graph your aggregates; create a node for each piece of information you need to save, and join nodes with lines if there is a rule that requires locking both nodes. If you find yourself with disconnected graphs, or two graphs that only connect at the root id, then it's a good bet that separating some information into a new graph will improve your modeling.
All Our Aggregates Are Wrong, by Mauro Servienti, would be a good talk to review.
We have microservices, each generating events that are being stored by a event-sourcing repository. We use Cassandra to store the event data.
As you may know, the order of the events is important.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
As you may know, the order of the events is important.
In some cases - but you'll want to be careful not to confuse time, order, and correlation.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
Give up the idea that there is an "order" to events that are happening in different places. There is no now.
Udi Dahan on race conditions in the business world:
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If your micro service boundaries are correct, then events happening in two difference services at about the same time are coincident -- there isn't one correct ordering of them, because (to stretch an analogy) they are in different light cones. The only ordering that is inherently real is that within a single aggregate event history.
What can make real sense is tracking causation; these changes in this book of record are a reaction to those changes in that book of record.
One simple form of this is to track happens-before, which is where ideas like vector clocks begin to appear.
In most discussions that I have seen, this information would be passed along as meta data of the recorded events.
This is typically done via vector clocks:
A vector clock is an algorithm for generating a partial ordering of events in a distributed system and detecting causality violations.
If I understand your problem correctly, you're trying to guard writes, i.e. to make sure that a microservice instance is up to date with all the relevant events before making another write.
In that case, have a look at lightweight transactions, which can be used to implement optimistic locking in Cassandra.
This talk by Christopher Batey is a very good start.
I'm trying to decide the best place to take care of presentation logic. I've separated out my Read queries (CQRS) with each method querying and generating a DTO for my View. But my Views are simply templates with variables scattered about that will come from the DTO. They don't have any logic in them.
Say I want to do some things like reformatting how the date looks, and turning flags into actual descriptive words, or adding little conditions on what is displayed depending on what is queried from the database, and so on. I'm thinking to put this logic in with each query, and to not worry about being too DRY (I find that in some cases if you DRY too much then you could be making things hard to change in that you have to check each dependency or hope your unit tests hold up). I may use some "helpers" here and there to do formatting that I find I keep doing, but I don't see the need to add a whole other "presentation layer". So presentation logic would reside with each query and go into the returned DTO, to be dropped right into a View. This would keep the Read side of CQRS super thin, and makes sense in that each View corresponds to a Read query. But I'm also concerned in that some of this presentation logic would be very specific to the domain. A new developer coming on board would need to look at other queries and repeat the same formatting techniques, as opposed to just throwing the data out there straight from a raw query.
Is this the sound approach, or is there another approach used in DDD/CQRS? I'm having trouble finding any guidance from CQRS research I've done. Note: I happen to be using PHP/MySQL, but I imagine this question is language agnostic.
I think the most important part to understand about CQRS is that it doesn't have to be complicated. In fact, for the read side of things go for the simplest solution that will work and be maintainable. If all you need is a SELECT statement from a view to bind to a grid, why make a bunch of layers, DTO's, and web services? Is that adding any value to the business? However if there is a legitimate reason to add a layer to the equation then you may do so, and usually DTOs are a good way to communicate between those layers.
Your system may call for different query strategies depending on the use case at hand, so this doesn't have to be a one size fits all approach. Performance should always be one of your first concerns, so get the data as close to the consuming presentation code as possible and only add complexity when truly needed.
Some might say this is not loosely coupled if the presentation layer is reading directly from the database. However, just because you have many layers between 2 things, doesn't make them loosely coupled. In fact, it may be the same amount of coupling, but now you've added a maintenance headache since you have to touch 10 places every time a field is added.
Focus more on your command side, and do whatever feels practical for the read side.
I'm just getting into event-driven architectures and would like to know what the convention is for naming commands and events. I know this much: Commands should be in the form DoSomething while events should be in the form SomethingHappened. What I need to clarify is if I need to append the word 'Command' to my commands and 'Event' to my events e.g. DoSomethingCommand as opposed to just DoSomething and SomethingHappenedEvent as opposed to just SomethingHappened. I would also like to know what the rationale is behind the community-preferred convention. Thanks!
The Command and Event suffixes are optional and are a matter of preference. I prefer to omit them and try to make the intent evident from the name alone. The most important aspect of naming commands and events is making sure they reflect the business domain more so than the technical domain. A lot of times terms like Create, Update, Add, Change are far too technical and have less meaning in the business domain. For example, instead of saying UpdateCustomerAddress you can say RelocateCustomer which could have a larger business context to it.
My convention is depending on namespaces and by that I mean that I never use the suffix Event nor Command.
I also organise commands and events into separate namespaces based on the aggregate type they are meant to affect.
Example:
// Commands
MyApp.Messages.Commands.Customers.Create
MyApp.Messages.Commands.Orders.Create
MyApp.Messages.Commands.Orders.AddProduct
// Events
MyApp.Messages.Events.Customers.Created
MyApp.Messages.Events.Orders.Created
MyApp.Messages.Events.Orders.ProductAdded
Depending on your requirements you might want to place your events into a separate assembly. The reason for this would be if you need to distribute events to downstream systems. In that case you probably don't want downstream systems to have to bother about your commands (because they shouldn't).
The convention that I've seen a lot and use myself is that events should be in past tense and described what happened:
UserRegistered
AccountActivated
ReplyPosted
Commands is something that you would like to do. So create names that illustrate that:
CreateUser
UppgradeUserAccount
As for organization, I usually put them together with the root aggregate that they are for. It makes it a lot easier to see what you can do and what kind of events that are generated.
That is, I create a namespace for each root aggregate and put everything under it (repository definition, events, commands).
MyApp.Core.Users
MyApp.Core.Posts
etc.
Commands and Events form a language for your application...an API. The use of terms like 'command' and 'event' are perhaps useful for system-level definitions where technical terms are meaningfully mixed into an entity's purpose, but if you are dealing with definitions for Domain behavior's, then drop the system/technical terminology and favor the business-speak. It will make your code read more naturally and lessen typing.
I started with the 'Command'/'Event' appendages but realized it was a waste of time and drew me away from the Ubiquitous Language DDD popularized.
HTH,
Mike
Appending Command and Event would be redundant information if yor commands/events are properly named. This would be noise that makes your code less readable. Remember Hungarian Notation? Most programmers (that I know of) don't use it anymore.