We're introducing event sourcing and refactoring to a rich domain model in a part of our application and we're a little bit confused about a part.
We have Tanks, which are the aggregate root and we receive TankGauge information on those Thanks every so often. Since the historical gauging information is not relevant for any business logic, the aggregate doesn't hold a collection of all the gauges, it only has a reference to the most recent gauge.
To display the historical gauges, we do have a projection set up.
Now we're getting the request: we want to update remarks on a historical gauge. The only relevancy this has is on the projection that holds the historical gauges. This causes a situation in which applying the RemarkSetOnHistoricalGauge event on the aggregate is basically a no-op and only the projection will be updated accordingly.
This works but it feels counter a bit counter-intuitive. Any suggestions on this approach?
I would consider whether you have two bounded contexts here:
the tank gauging context: consuming TankGauge commands, validating that they make sense (if you're not, then there's not really a point to having the Tank aggregate) and emitting a TankGaugeWas (TankLevelWas?) event
the tank history context: allowing remarks/annotations to be made regarding historical levels
Both of these have a concept of a tank, but not the same one. The history context consumes events from the gauging context (those events become commands, implicitly "incorporate this event into your worldview", from the perspective of the history context). They can have different aggregate logic: it might even be worth modeling a tank history aggregate as being time-windowed, e.g. that the tank tomorrow is not the same aggregate as the same tank today (this has the main benefit that you don't have to load the entire history of measures or scan through a history of snapshots).
This ability to actually cover the situation where different contexts are using a term to mean different things is one of the strengths of DDD (and the fact that the term takes on a different meaning is a very strong sign that there's a context boundary being crossed).
Capturing events as they happen in the external world is orthogonal to using the events to re-hydrate an aggregate. The fact that the aggregate does not map an event today is just happenstance, which could change over time. But capturing the event is a necessity.
As an aside, raising events that have no immediate relevance to the Aggregate but are valuable for projections is normal.
I have a similar example from my past. Our Identity domain listened to events published in the Email subdomain to raise EmailSent and EmailDelivered events within itself against emails sent earlier to a user. These events can sometimes arrive out of sequence if the user continues to use the website. The aggregate does not hold attributes to represent this information, but such events helped cross-check signups and conversions.
Related
I am trying to model a real-time collaboration application with DDD. A particular feature with some Hotspot events is CAD visualization.
Problem #1
Multiple participants join a 3D virtual environment and one of them is designated as a facilitator. Although all participants can change various preferences for themselves, the facilitator can change preferences for all users. The users can change them back on an individual level.
The problem I am facing is, single vs bulk operation. Do I submit a granular event for bulk operations or a single event? If an existing process listens to the granular event, it will miss the bulk event unless communicated explicitly which doesn't result in so clean boundary.
Problem #2
Interestingly enough this is a variation of problem #1 but a bit more severe. A CAD model comes with some meta-structure which is a DAG. Each leaf level structure is a group of triangles that are manipulated together. These groups of triangles are called Volume. A group of volumes forms another concept known as a Branch. A branch can contain other branches as a child. The branch+volume structure always forms a tree. Some disjoint tree branches form another concept called Group.
Now a participant can make a branch/group/volume visible and hidden. Do I publish a single branch-level event or create an event for *every branch/volume in the forward path?
I have thought about publishing bulk events for bulk operations and single events for single operations under the same topic. This doesn't feel good as I may introduce new bulk events and require another downstream context to break.
Alternatively, I thought about publishing both bulk and granular event with correlation_id. If a bulk event were understood, the downstream can ignore the following events with the same correlation id. Although this seems promising, Still doesn't feel good as the downstream may process events concurrently and later events could be processed earlier than the bulk event.
Can bulk operations be properly modeled using DDD? Is there a way to rethink the composite pattern which is more DDD friendly?
1.) bulk event, the id can be a query for all the matching ids at that moment or the explicit matching id list. you need it, because if you want to revert the event somehow, then you will have a problem if you lose the connection between individual events. it is an infot which must be stored too.
2.) looks like some sort of weird graph, it reminds me of the knowledge graph of sciences: math, physics, chemistry, biology, etc. where everything builds on math and they are interrelated, still people want to force them into a hierarchy. the problem that there are terms which are half way between two sciences so when you select the term of one science you cannot decide which they belong to. the same solution, selecting things with queries works for this too. I thought a lot about this problem too. having a shitload individual events will require a massive storage space after a certain size. better to use bulk with queries and compute them or save the id list as a query cache, but don't duplicate anything else. as of the semi-hierarchical structure, I have no idea how to model it properly. I would use a simple graph and tag everything and query based on the tags, but still there is a sort of hierarchy, which is hard to grasp from a pure graph perspective without any kind of weighting.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates. For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on. Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store
The event store in a DDD project is designed around event-sourced Aggregates:
it provides the efficient loading of all events previously emitted by an Aggregate root instance (having a given, specified ID)
those events must be retrieved in the order they where emitted
it must not permit appending events at the same time for the same Aggregate root instance
all events emitted as result of a single command must be all appended atomically; this means that they should all succeed or all fail
The 4th point could be implemented using transactions but this is not a necessity. In fact, for scalability reasons, if you can then you should choose a persistence that provides you atomicity without the use of transactions. For example, you could store the events in a MongoDB document, as MongoDB guaranties document-level atomicity.
The 3rd point can be implemented using optimistic locking, using a version column with an unique index per (version x AggregateType x AggregateId).
At the same time, there is a DDD rule regarding the Aggregates: don't mutate more than one Aggregate per transaction. This rule helps you A LOT to design a scalable system. Break it if you don't need one.
So, the solution to all these requirements is something that is called an Event-stream, that contains all the previous emitted events by an Aggregate instance.
So I would have an Inventory aggregate
The DDD has higher precedence than the Event-store. So, if you have some business rules that force you to decide that you must have a (big) Inventory aggregate, then yes, it would load ALL the previous events generated by itself. Then the InventoryItem would be a nested entity that cannot emit events by itself.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Yes, indeed. The simplest thing would be for us to all have a single Aggregate, with a single instance. Then the consistency would be the strongest possible. But this is not efficient so you need to better think about the real business requirements.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location?
There is another possibility. You should model the assigning of product codes as a Business Process. For this you could use a Saga/Process manager that would orchestrate the entire process. This Saga could use a collection with an unique index added to the product code column in order to ensure that only one product uses a given product code.
You could design the Saga to permit the allocation of an already-taken code to a product and to compensate later or to reject the invalid allocation in the first place.
It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
The Saga uses indeed a private state maintained from the domain events in an eventual consistent state, just like a Read-model but this does not feel wrong for me. It may use whatever it needs in order to bring (eventually) the system as a hole to a consistent state. It complements the Aggregates, whose purpose is to not allow the building-blocks of the system to get into an invalid state.
I am just starting out with ES/DDD and I have a question how one is supposed to do reporting in this architecture. Lets take a typical example, where you have a Customer Aggregate, Order Aggregate, and Product Aggregate all independent.
Now if i want to run a query across all 3 aggregates and/or services, but that data is each in a separate DB, maybe one is SQL, one is a MongoDB, and one something else. How is one supposed to design or be able to run a query that would require a join across these aggregates ?
You should design the Reporting as a simple read-model/projection, possible in its own bounded context (BC), that just listen to the relevant events from the other bounded contexts (Customer BC, Ordering BC and Inventory BC) and builds the needed reports with full data denormalization (i.e. at query time you won't need to query the original sources).
Because of events you won't need any joins as you could maintain a private local state attached to the Reporting read-model in which you can store temporary external models and query those temporary read-models as needed thus avoiding external additional queries to the other BCs.
An anti-corruption layer would not be necessary in this case as there would be no write-model involved in the Reporting BC.
Things are really as simple as that because you already have an event-driven architecture (you use Event sourcing).
UPDATE:
This particular solution is very handy in creating new reports that you haven't thought ahead of time. Every time you thing about a new report you just create a new Read-model (as in you write its source code) then you replay all the relevant events on it. Read-models are side-effect free, you can replay all the events (from the beggining of time) any time and as many time you want.
Read-model rebuilding is done in two situations:
you create a new Read-model
you modify an existing one by listening to a new event or the algorithm differs too much from the initial version
You can read more here:
DDD/CQRS specialized forum - Grey Young is there!
Event sourcing applied – the read model
Writing an Event-Sourced CQRS Read Model
A post in first group describing Read Model rebuilding
Or you can search about this using this text: event sourcing projection rebuilding
Domain-Driven Design is more concerned with the command side of things. You should not attempt to query your domain as that leads to pain and suffering.
Each bounded context may have its own data store and that data store may be a different technology as you have stated.
For reporting you would use a reporting store. How you get data into that store would either require each bounded context to publish events that the reporting BC would pick up and use to update the reporting store or you could make use of event sourcing where the reporting store would project the events into the relevant reporting structures.
There are known practices to solve this.
One might be having a reporting context, which, as Eben has pointed out, will listen to domain events from other contexts and update its store. This of course will lead to issues, since this reporting context will be coupled to all services it reports from. Some might say this is a necessary evil but this is not always the case.
Another technique is to aggregate on-demand. This is not very complex and can be done on different layers/levels. Consider aggregation on the web API level or even on the front-end level, if your reporting is on the screen (not sent by mail as PDF, for example).
This is well known as UI composition and Udi Dahan has wrote an article about this, which is worth reading: UI Composition Techniques for Correct Service Boundires. Also, Mauro Servienti has wrote a blog post about this recently: The secret of better UI composition.
Mauro mentions two types of composition, which I mentioned above. The API/server-side composition is called ViewModel Composition in his post, and front-end (JavaScript) composition is mentioned in the Client side composition process section. Server-side composition is illustrated by this picture:
DDD strategic modeling tools says:
Design two different models 1. Write Models (Handles Command Side) 2.Read Models (POCOs/POJOs) whatever u call them.
We have microservices, each generating events that are being stored by a event-sourcing repository. We use Cassandra to store the event data.
As you may know, the order of the events is important.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
As you may know, the order of the events is important.
In some cases - but you'll want to be careful not to confuse time, order, and correlation.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
Give up the idea that there is an "order" to events that are happening in different places. There is no now.
Udi Dahan on race conditions in the business world:
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If your micro service boundaries are correct, then events happening in two difference services at about the same time are coincident -- there isn't one correct ordering of them, because (to stretch an analogy) they are in different light cones. The only ordering that is inherently real is that within a single aggregate event history.
What can make real sense is tracking causation; these changes in this book of record are a reaction to those changes in that book of record.
One simple form of this is to track happens-before, which is where ideas like vector clocks begin to appear.
In most discussions that I have seen, this information would be passed along as meta data of the recorded events.
This is typically done via vector clocks:
A vector clock is an algorithm for generating a partial ordering of events in a distributed system and detecting causality violations.
If I understand your problem correctly, you're trying to guard writes, i.e. to make sure that a microservice instance is up to date with all the relevant events before making another write.
In that case, have a look at lightweight transactions, which can be used to implement optimistic locking in Cassandra.
This talk by Christopher Batey is a very good start.
I am new to DDD. Now I was looking at the domain event. I am not sure if I understand this domain event correctly, but I am just thinking what will happen if domain event published failed?
I have a case here. When a buyer order something from my website, firstly we will create a object, Order with line of items. The domain event, OrderWasMade, will be published to deduct the stock in Inventory. So here is the case, what if when the event was handled, the item quantity will be deducted, but what if when the system try to deduct the stock, it found out that there is no stock remaining for the item (amount = 0). So, the item amount can't be deducted but the order had already being committed.
Will this kind of scenario happen?
Sorry to have squeeze in 2 other questions here.
It seems like each event will be in its own transaction scope, which means the system requires to open multiple connection to database at once. So if I am using IIS Server, I must enable DTC, am I correct?
Is there any relationship between domain-events and domain-services?
A domain event never fails because it's a notification of things that happened (note the past tense). But the operation which will generate that event might fail and the event won't be generated.
The scenario you told us shows that you're not really doing DDD, you're doing CRUD using DDD words. Yes, I know you're new to it, don't worry, everybody misunderstood DDD until they got it (but it might take some time and plenty of practice).
DDD is about identifying the domain model abstraction, which is not code. Code is when you're implementing that abstraction. It's very obvious you haven't done the proper modelling, because the domain expert should tell you what happens if products are out of stock.
Next, there's no db/acid transactions at this level. Those are an implementation detail. The way DDD works is identifying where the business needs things to be consistent together and that's called an aggregate.
The order was submitted and this where that use case stops. When you publish the OrderWasMadeevent, another use case (deducting the inventory or whatever) is triggered. This is a different business scenario related but not part of "submit order". If there isn't enough stock then another event is published NotEnoughInventory and another use case will be triggered. We follow the business here and we identify each step that the business does in order to fulfill the order.
The art of DDD consists in understanding and identifying granular business functionality, the involved aggregates, business behaviour which makes decisions etc and this has nothing to do the database or transactions.
In DDD the aggregate is the only place where a unit of work needs to be used.
To answer your questions:
It seems like each event will be in its own transaction scope, which means the system requires to open multiple connection to database at once. So if I am using IIS Server, I must enable DTC, am I correct?
No, transactions,events and distributed transactions are different things. IIS is a web server, I think you want to say SqlServer. You're always opening multiple connections to the db in a web app, DTC has nothing to do with it. Actually, the question tells me that you need to read a lot more about DDD and not just Evans' book. To be honest, from a DDD pov it doesn't make much sense what you're asking.. You know one of principles of DD: the db (as in persistence details) doesn't exist.
Is there any relationship between domain-events and domain-services
They're both part of the domain but they have different roles:
Domain events tell the world that something changed in the domain
Domain services encapsulate domain behaviour which doesn't have its own persisted state (like Calculate Tax)
Usually an application service (which acts as a host for a business use case) will use a domain service to verify constraints or to gather data required to change an aggregate which in turn will generate one or more events. Aggregates are the ones persisted and always, an aggregate is persisted in an atomic manner i.e db transaction / unit of work.
what will happen if domain event published failed?
MikeSW already described this - publishing the event (which is to say, making it part of the history) is a separate concern from consuming the event.
what if when the system try to deduct the stock, it found out that there is no stock remaining for the item (amount = 0). So, the item amount can't be deducted but the order had already being committed.
Will this kind of scenario happen?
So the DDD answer is: ask your domain experts!
If you sit down with your domain experts, and explore the ubiquitous language, you are likely to discover that this is a well understood exception to the happy path for ordering, with an understood mitigation ("we mark the status of the order as pending, and we check to see if we've already ordered more inventory from the supplier..."). This is basically a requirements discovery exercise.
And when you understand these requirements, you go do it.
Go do it typically means a "saga" (a somewhat misleading and overloaded use of the term); a business process/workflow/state machine implementation that keeps track of what is going on.
Using your example: OrderWasMade triggers an OrderFulfillment process, which tracks the "state" of the order. There might be an "AwaitingInventory" state where OrderFulfillment parks until the next delivery from the supplier, for example.
Recommended reading:
http://udidahan.com/2010/08/31/race-conditions-dont-exist/
http://udidahan.com/2009/04/20/saga-persistence-and-event-driven-architectures/
http://joshkodroff.com/blog/2015/08/21/an-elegant-abandoned-cart-email-using-nservicebus/
If you need the stock to be immediately consistent at all times, a common way of handling this in event sourced systems (can also in non-event based systems, this is orthogonal really) is to rely on optimistic locking at the event store level.
Events basically have a revision number that they expect the stream of events to be at to take effect. Once the event hits the persistent store, its revision number is checked against the real stream number and if they don't match, a conflict exception is raised and the transaction is aborted.
Now as #MikeSW pointed out, depending on your business requirements, stock checking can be an out-of-band process that handles the problem in an eventually consistent way. Eventually can range from milliseconds if another part of the process takes over immediately, to hours if an email is sent with human action needing to be taken.
In other words, if your domain requires it, you can choose to trade this sequence of events
(OrderAbortedOutOfStock)
for
(OrderMade, <-- Some amount of time --> OrderAbortedOutOfStock)
which amounts to the same aggregate state in the end