Event sourcing microservices: How to manage timestamp - cassandra

We have microservices, each generating events that are being stored by a event-sourcing repository. We use Cassandra to store the event data.
As you may know, the order of the events is important.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.

As you may know, the order of the events is important.
In some cases - but you'll want to be careful not to confuse time, order, and correlation.
When we generate these events from different services running in different machines, how to manage the time (timestamp) going out of sync across these thereby resulting in an event order mismatch.
Give up the idea that there is an "order" to events that are happening in different places. There is no now.
Udi Dahan on race conditions in the business world:
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If your micro service boundaries are correct, then events happening in two difference services at about the same time are coincident -- there isn't one correct ordering of them, because (to stretch an analogy) they are in different light cones. The only ordering that is inherently real is that within a single aggregate event history.
What can make real sense is tracking causation; these changes in this book of record are a reaction to those changes in that book of record.
One simple form of this is to track happens-before, which is where ideas like vector clocks begin to appear.
In most discussions that I have seen, this information would be passed along as meta data of the recorded events.

This is typically done via vector clocks:
A vector clock is an algorithm for generating a partial ordering of events in a distributed system and detecting causality violations.

If I understand your problem correctly, you're trying to guard writes, i.e. to make sure that a microservice instance is up to date with all the relevant events before making another write.
In that case, have a look at lightweight transactions, which can be used to implement optimistic locking in Cassandra.
This talk by Christopher Batey is a very good start.

Related

Domain events with composite pattern

I am trying to model a real-time collaboration application with DDD. A particular feature with some Hotspot events is CAD visualization.
Problem #1
Multiple participants join a 3D virtual environment and one of them is designated as a facilitator. Although all participants can change various preferences for themselves, the facilitator can change preferences for all users. The users can change them back on an individual level.
The problem I am facing is, single vs bulk operation. Do I submit a granular event for bulk operations or a single event? If an existing process listens to the granular event, it will miss the bulk event unless communicated explicitly which doesn't result in so clean boundary.
Problem #2
Interestingly enough this is a variation of problem #1 but a bit more severe. A CAD model comes with some meta-structure which is a DAG. Each leaf level structure is a group of triangles that are manipulated together. These groups of triangles are called Volume. A group of volumes forms another concept known as a Branch. A branch can contain other branches as a child. The branch+volume structure always forms a tree. Some disjoint tree branches form another concept called Group.
Now a participant can make a branch/group/volume visible and hidden. Do I publish a single branch-level event or create an event for *every branch/volume in the forward path?
I have thought about publishing bulk events for bulk operations and single events for single operations under the same topic. This doesn't feel good as I may introduce new bulk events and require another downstream context to break.
Alternatively, I thought about publishing both bulk and granular event with correlation_id. If a bulk event were understood, the downstream can ignore the following events with the same correlation id. Although this seems promising, Still doesn't feel good as the downstream may process events concurrently and later events could be processed earlier than the bulk event.
Can bulk operations be properly modeled using DDD? Is there a way to rethink the composite pattern which is more DDD friendly?
1.) bulk event, the id can be a query for all the matching ids at that moment or the explicit matching id list. you need it, because if you want to revert the event somehow, then you will have a problem if you lose the connection between individual events. it is an infot which must be stored too.
2.) looks like some sort of weird graph, it reminds me of the knowledge graph of sciences: math, physics, chemistry, biology, etc. where everything builds on math and they are interrelated, still people want to force them into a hierarchy. the problem that there are terms which are half way between two sciences so when you select the term of one science you cannot decide which they belong to. the same solution, selecting things with queries works for this too. I thought a lot about this problem too. having a shitload individual events will require a massive storage space after a certain size. better to use bulk with queries and compute them or save the id list as a query cache, but don't duplicate anything else. as of the semi-hierarchical structure, I have no idea how to model it properly. I would use a simple graph and tag everything and query based on the tags, but still there is a sort of hierarchy, which is hard to grasp from a pure graph perspective without any kind of weighting.

DDD: creating multiple aggregates with a shared life-cycle in a single transaction

I'm aware of the general rule that only a single aggregate should be modified per transaction, mostly for concurrency and transactional consistency issues, as far as I'm aware.
I have a use case where I want to create multiple aggregates in a single transaction: a RestaurantManager, a Restaurant, and a Menu. They seem like a single aggregate because their life-cycles begin and end together: it doesn't make sense within the domain to create a RestaurantManager without a Restaurant, or vice versa; the same goes for a Restaurant and a Menu. Further, if the Restaurant or the RestaurantManager is deleted (unregistered), they should all be deleted together.
However, I've split them into separate aggregates because, once created, they are updated separately, maintain their own invariants, and I don't want to load them all into memory just to update one property on the Restaurant, for example.
The only thing that ties them together is their life-cycle.
My question is whether this represents a case where it is okay to go against the "rule" that each transaction should only operate on a single aggregate.
I'd also like to know if I should enforce their shared life-cycle in the domain model by having each aggregate root hold the identifier of the aggregate root it depends on, i.e. by having Restaurant require a MenuId as a constructor parameter, and likewise for Menu and RestaurantId, so that neither can be created without the other. However, this still wouldn't enforce that they should be saved together by the application service anyway, since it could create them all in memory, then only save the Menu, for example.
Your requirement is a pretty normal use case in DDD, IMHO. There are always multiple aggregates working in tandem to support the application, and they are interlinked in their lifecycles. But the modeling concepts still stand true. Let me attempt to explain what your model would look like with the help of a few DDD rules:
Aggregates are transaction boundaries
Aggregates ensure that no business invariants are broken at any point. This means that if you have multiple aggregates strung together as part of one transaction, you have to load all of them into memory for the validation.
This is especially a problem when your application is data-rich and stores data in a database cluster - partitioned, distributed (think Mongo or Elasticsearch). You will have the problem of loaded up data from potentially different clusters as part of a single transaction.
Aggregates are loaded in entirety
Aggregates and their associated data objects are loaded in entirety into memory. This means that unnecessary objects (say the restaurant's schedule for the upcoming month, for example) for the transaction may be loaded into memory. By itself, this is not a problem. But when multiple aggregates get together, the amount of data loaded into memory needs to be considered.
Aggregates refer to each other by their unique identifiers
This one is straightforward and means that each aggregate stores its referenced aggregates by their identifiers instead of enclosing the other aggregate's data within it.
State changes across Aggregates are handled through Domain Events
In cases where you want a state change in one aggregate to have side-effects on other aggregates, you publish a domain event, and a subscriber handles the change on other aggregates in the background. This is how you would want to handle your requirement for cascade deletes.
By following these rules, you are essentially zooming in one single aggregate at a time and ensuring that the complexity remains low. When you string up multiple aggregates, though it is clear and understandable on day 1, eventually, the application tends towards becoming a big ball of mud, as dependencies and invariants start crisscrossing each other.
"only a single aggregate should be modified per transaction"
Contention at creation doesn't matter as much. You can create many ARs in a single transaction without problem because the only other operation that could conflict is another duplicate creation process.
Another reason to avoid involving many ARs in a single transaction is coupling between modules though, but you could always keep things loosely coupled using synchronously dispatched domain events.
As for the deletion, it's probably less problematic to make it eventually consistent. Does it really matter that Restaurant is closed while RestaurantManager remains registered for a short period of time?
The fact you are asking this question tells me your system is not distributed? If your system is running with a single DB server and used by a few people it may be that eventual consistency make things more complex for scalability you don't actually need.
Start simple and refactor as needed, but crossing AR boundaries is not something that should be done consistently or else your boundaries are clearly wrong.
Furthermore, if you want to communicate that a RestaurantManager can't be spawned from nowhere and associated with an invalid RestaurantId by mistake you may want to look at your ubiquitous language for guidance.
e.g.
"A RestaurantManager is registered for a given Restaurant": not sure it truly aligns with your UL, but it's just for the sake of the example.
RestaurantManager manager = restaurant.registerManager(...);
This obviously increases coupling and could affect performance, but it aligns well with the UL and makes it more difficult to misuse the model. Also note that with a single DB, you could enforce referential integrity which takes cares of these uninteresting referential constraints.
As pointed out by #plalx, contention doesn't matter as much when creating aggregates in terms of transactions, since they don't yet exist so can't be involved in contention.
As for enforcing the mutual life cycle of multiple aggregates in the domain, I've come to think that this is the responsibility of the application layer (i.e. an application service, or use case).
Maybe my thinking is closer to Clean or Hexagonal architecture, but I don't think it's possible or even sensible to try and push every single business rule down into the "domain model". The point of the domain model for me is to partition the problem domain into small chunks (aggregates), which encapsulate common business data/operations that change together, but it's the application layer's responsibility to use these aggregates properly in order to achieve the business' end goal (which is the application as a whole), including mediating operations between the aggregates and controlling their life cycles.
As such, I think this stuff belongs in an application service. That being said, frequently updating multiple aggregates in each use case could be a sign of incorrect domain boundaries.

Stream aggregate relationship in an event sourced system

So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates. For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on. Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store
The event store in a DDD project is designed around event-sourced Aggregates:
it provides the efficient loading of all events previously emitted by an Aggregate root instance (having a given, specified ID)
those events must be retrieved in the order they where emitted
it must not permit appending events at the same time for the same Aggregate root instance
all events emitted as result of a single command must be all appended atomically; this means that they should all succeed or all fail
The 4th point could be implemented using transactions but this is not a necessity. In fact, for scalability reasons, if you can then you should choose a persistence that provides you atomicity without the use of transactions. For example, you could store the events in a MongoDB document, as MongoDB guaranties document-level atomicity.
The 3rd point can be implemented using optimistic locking, using a version column with an unique index per (version x AggregateType x AggregateId).
At the same time, there is a DDD rule regarding the Aggregates: don't mutate more than one Aggregate per transaction. This rule helps you A LOT to design a scalable system. Break it if you don't need one.
So, the solution to all these requirements is something that is called an Event-stream, that contains all the previous emitted events by an Aggregate instance.
So I would have an Inventory aggregate
The DDD has higher precedence than the Event-store. So, if you have some business rules that force you to decide that you must have a (big) Inventory aggregate, then yes, it would load ALL the previous events generated by itself. Then the InventoryItem would be a nested entity that cannot emit events by itself.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Yes, indeed. The simplest thing would be for us to all have a single Aggregate, with a single instance. Then the consistency would be the strongest possible. But this is not efficient so you need to better think about the real business requirements.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location?
There is another possibility. You should model the assigning of product codes as a Business Process. For this you could use a Saga/Process manager that would orchestrate the entire process. This Saga could use a collection with an unique index added to the product code column in order to ensure that only one product uses a given product code.
You could design the Saga to permit the allocation of an already-taken code to a product and to compensate later or to reject the invalid allocation in the first place.
It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
The Saga uses indeed a private state maintained from the domain events in an eventual consistent state, just like a Read-model but this does not feel wrong for me. It may use whatever it needs in order to bring (eventually) the system as a hole to a consistent state. It complements the Aggregates, whose purpose is to not allow the building-blocks of the system to get into an invalid state.

What if domain event failed?

I am new to DDD. Now I was looking at the domain event. I am not sure if I understand this domain event correctly, but I am just thinking what will happen if domain event published failed?
I have a case here. When a buyer order something from my website, firstly we will create a object, Order with line of items. The domain event, OrderWasMade, will be published to deduct the stock in Inventory. So here is the case, what if when the event was handled, the item quantity will be deducted, but what if when the system try to deduct the stock, it found out that there is no stock remaining for the item (amount = 0). So, the item amount can't be deducted but the order had already being committed.
Will this kind of scenario happen?
Sorry to have squeeze in 2 other questions here.
It seems like each event will be in its own transaction scope, which means the system requires to open multiple connection to database at once. So if I am using IIS Server, I must enable DTC, am I correct?
Is there any relationship between domain-events and domain-services?
A domain event never fails because it's a notification of things that happened (note the past tense). But the operation which will generate that event might fail and the event won't be generated.
The scenario you told us shows that you're not really doing DDD, you're doing CRUD using DDD words. Yes, I know you're new to it, don't worry, everybody misunderstood DDD until they got it (but it might take some time and plenty of practice).
DDD is about identifying the domain model abstraction, which is not code. Code is when you're implementing that abstraction. It's very obvious you haven't done the proper modelling, because the domain expert should tell you what happens if products are out of stock.
Next, there's no db/acid transactions at this level. Those are an implementation detail. The way DDD works is identifying where the business needs things to be consistent together and that's called an aggregate.
The order was submitted and this where that use case stops. When you publish the OrderWasMadeevent, another use case (deducting the inventory or whatever) is triggered. This is a different business scenario related but not part of "submit order". If there isn't enough stock then another event is published NotEnoughInventory and another use case will be triggered. We follow the business here and we identify each step that the business does in order to fulfill the order.
The art of DDD consists in understanding and identifying granular business functionality, the involved aggregates, business behaviour which makes decisions etc and this has nothing to do the database or transactions.
In DDD the aggregate is the only place where a unit of work needs to be used.
To answer your questions:
It seems like each event will be in its own transaction scope, which means the system requires to open multiple connection to database at once. So if I am using IIS Server, I must enable DTC, am I correct?
No, transactions,events and distributed transactions are different things. IIS is a web server, I think you want to say SqlServer. You're always opening multiple connections to the db in a web app, DTC has nothing to do with it. Actually, the question tells me that you need to read a lot more about DDD and not just Evans' book. To be honest, from a DDD pov it doesn't make much sense what you're asking.. You know one of principles of DD: the db (as in persistence details) doesn't exist.
Is there any relationship between domain-events and domain-services
They're both part of the domain but they have different roles:
Domain events tell the world that something changed in the domain
Domain services encapsulate domain behaviour which doesn't have its own persisted state (like Calculate Tax)
Usually an application service (which acts as a host for a business use case) will use a domain service to verify constraints or to gather data required to change an aggregate which in turn will generate one or more events. Aggregates are the ones persisted and always, an aggregate is persisted in an atomic manner i.e db transaction / unit of work.
what will happen if domain event published failed?
MikeSW already described this - publishing the event (which is to say, making it part of the history) is a separate concern from consuming the event.
what if when the system try to deduct the stock, it found out that there is no stock remaining for the item (amount = 0). So, the item amount can't be deducted but the order had already being committed.
Will this kind of scenario happen?
So the DDD answer is: ask your domain experts!
If you sit down with your domain experts, and explore the ubiquitous language, you are likely to discover that this is a well understood exception to the happy path for ordering, with an understood mitigation ("we mark the status of the order as pending, and we check to see if we've already ordered more inventory from the supplier..."). This is basically a requirements discovery exercise.
And when you understand these requirements, you go do it.
Go do it typically means a "saga" (a somewhat misleading and overloaded use of the term); a business process/workflow/state machine implementation that keeps track of what is going on.
Using your example: OrderWasMade triggers an OrderFulfillment process, which tracks the "state" of the order. There might be an "AwaitingInventory" state where OrderFulfillment parks until the next delivery from the supplier, for example.
Recommended reading:
http://udidahan.com/2010/08/31/race-conditions-dont-exist/
http://udidahan.com/2009/04/20/saga-persistence-and-event-driven-architectures/
http://joshkodroff.com/blog/2015/08/21/an-elegant-abandoned-cart-email-using-nservicebus/
If you need the stock to be immediately consistent at all times, a common way of handling this in event sourced systems (can also in non-event based systems, this is orthogonal really) is to rely on optimistic locking at the event store level.
Events basically have a revision number that they expect the stream of events to be at to take effect. Once the event hits the persistent store, its revision number is checked against the real stream number and if they don't match, a conflict exception is raised and the transaction is aborted.
Now as #MikeSW pointed out, depending on your business requirements, stock checking can be an out-of-band process that handles the problem in an eventually consistent way. Eventually can range from milliseconds if another part of the process takes over immediately, to hours if an email is sent with human action needing to be taken.
In other words, if your domain requires it, you can choose to trade this sequence of events
(OrderAbortedOutOfStock)
for
(OrderMade, <-- Some amount of time --> OrderAbortedOutOfStock)
which amounts to the same aggregate state in the end

Synchronizing Query-side Data in CQRS - won't there still be contention?

I have a general question about the CQRS paradigm in general.
I understand that a CommandBus and EventBus will decouple the domain model from our Query-side datastore, the merits of eventual consistency, and being able to denormalize the storage on the Query side to optimize reads, etc. That all sounds great.
But I wonder as I begin to expand the number of the components on the Query side responsible for updating the Query datastore, if they wouldn't start to contend with one another to perform their updates?
In other words, if we tried to use a pub/sub model for the EventBus, and there were a lot of different subscribers for a particular event type, couldn't they start to contend with one another over updating various bits of denormalized data? Wouldn't this put us in the same boat as we were before CQRS?
As I've heard it explained, it sounds like CQRS is supposed to do away with this contention all together, but is this just an ideal, and in reality we're only really minimizing it? I feel like I could be missing something here, but can't put my finger on it.
it all depends on how you have designed the infrastructure. Strictly speaking, CQRS in itself doesn't say anything about how the Query models are updated. Using Events is just a one of the options you have. CQRS doesn't say anything about dealing with contention either. It's just an architectural pattern that leaves you with more options and choices to deal with things like concurrency. In "regular" architectures, such as the layered architecture, you often don't have these options at all.
If you have scaled your command processing component out on multiple machines, you can assume that they can produce more events than a single event handling component can handle. That doesn't have to be a bad thing. It may just mean that the Query models will be updated with a slightly bigger delay during peak times. If it is a problem for you, then you should consider scaling out the query models too.
The Event Handler component themselves will not be contending with each other. They can safely process events in parallel. However, if you design the system to make them all update the same data store, your data store could be the bottleneck. Setting up a cluster or dividing the query model over different data sources altogether could be a solution to your problem.
Be careful not to prematurely optimize, though. Don't scale out until you have the figures to prove that it will help in your specific case. CQRS based architectures allow you to make a lot of choices. All you need to do is make the right choice at the right time.
So far, in the application's I am involved with, I haven't come across situations where the Query model was a bottleneck. Some of these applications produce more than 100mln events per day.

Resources