DDD/CQRS Querying Events - domain-driven-design

I was looking at post's on querying in application designed with approach Event Sourcing/DDD/CQRS.
As I understand events are changes to the state of a domain object. The changes to state will be maintained as history/events in DB(any of sql/no sql).
If user wants to query to get current state for a particular aggregate root, it will involve fetching history of events.
When user will query especially business specific queries he/she will be interested in current state not the history of events.
How querying or 'Q' part in CQRS works with event sourcing?
Consider I have a domain object "Account" as aggregate root. The account AR will go through lots of changes i.e. credits debits. event store will have credit and debit events.
Consider user is required to get current balance of an account, how stream of history of events will suite here? How will user fetch current balance for given account?
I am unable to understand, How for business specific querying history of events will be useful?
-Prakhyat M M

I would recommend you to read more articles from Greg Young (He is like the father of CQRS and Event Sourcing), like this: CQRS, Task Based UIs, Event Sourcing... agh.
Sorry for my bad English, I am from Paraguay. But I really like DDD - CQRS - ES and I would like to try to make a point.
The use of "Projections" (also known as Materialized Views) and the concept of "Eventual Consistency" are the fundamentals that every practitioner of CQRS should understand very well. The Event Store is for query. Is in the Command side of CQRS, not the in the Query side. You may use a bus to send the events stored in the Event Store to the query side in order to process and generate a read model, or view models, from which you can query. In any case a eventstore per se is a query model.
Looks like you are a Java guy, but, still, you may want to check the CQRS Journey from Microsoft.
Hope this helps a little bit and motivates you to do more research on DDD / CQRS / ES, the New Trio of Line of Business Applications.

You'll use a projection of the event stream into the read model, that contains exactly those information that the Query-side (Q) needs. For example, you could have an "account balance" projection that follows all events that change the account balance, but possibly ignores other events in the account's stream (such as owner changes). The projection then saves that info in a way that it can be queried very quickly, e.g., in memory or in a small read-model database table (accountId, balance) with the accountId as the key (database can be a key-value store, for example).
I suggest further reading on the CQRS concept such as this one or this one.

Interesting enough, recently more people discover using event store as the read model, leaving projections and "proper" read models until absolutely necessary.
We all know that dealing with projections increases the complexity. At minimum you have to create new models, establish the DAL for the read model and create projections to translate event to the read model changes, and bind those projections to the stream of events from your store. It requires more code, more moving parts and some of them are not easy to test. Schema changes at the read side also require migrations.
It appears that for many scenarios reading all events (properly partitioned) might be enough to have your "read model". It takes not much time until the system really grows large so you need to read tens of thousands of events to create one UI screen. But before you reach this point, you can just read events. May be use the file system to store events although tools like EventStore are free and quite easy to use. May be add some indexing.
This approach let you stabilise the domain significantly, you get more knowledge about how the system works, tune the events and be really prepared to bring the "proper" read model into the system, but you might not have to.
Adam Dymitruk has wrote a blog post about it, you might find it worth reading even if you don't want to take this approach. Greg Young also gave a talk EventStore as read model back in 2012.

Related

CQRS to command or not to, that is the question

I am new to CQRS, but can see the value in this, so I am trying to apply this to a financial system that we are busy rebuilding.
Like I mentioned, this is a basic fin system with basic balance, withdraw, deposit like functionality.
I have a withdraw & deposit commands. But I am struggling with balance.
According to the domain experts, they want to handle balance as a transaction, with no financial implication (yet), on the clients behalf. So, when the client does a balance inq via the device, it creates a transaction, but also a balance query at the same time.
In the CQRS world, you distiguish between commands that mutate state & queries, that retrieve data in some way.
Apologies if my understanding here are flawed. Can someone point me in the correct direction?
EDIT:
Maybe let me put it this way. I was thinking of creating a CheckBalanceCommand that creates a transaction & insert a BalanceCheckedEvent into the store. But then I would also need to create a CheckBalanceQuery to retrieve the actual balance from the read db.
I would need to invoke both in order to satisfy the balance request.
This is an interesting issue. Your business case is valid: some commands don't mutate aggregate/entity states, still treating them and their resultant events are important (e.g. for audit trails).
In order to support these cases, I'd introduce a base event type named IdentityEvent (inspired by identity values for various mathematical operators and as a justification for the concept; operating them on a certain value doesn't change it). On issuing the corresponding command, derivatives of this event (e.g. BalanceCheckedEvent in your case) will be appended to the aggregate's event stream and view projection may construct views from them as usual; however, their mutate method will not perform any actual mutation while reconstructing entities from event stream.
The actual command processing takes place at the domain layer. Some of your application service, at the application layer, receives the query request, processes it as usual. Additionally, before or after the query operation, the same application service may issue the command to the domain layer, on the aggregate root itself. That doesn't violate any principle: your read and query model are still separate, application service just coordinating between the two.
This is not as rare as you would imagine. An additional valid business case is when a service provider runs a credit check on someone. Credit reporting companies actually store queries made against ones credit score, and use it to influence future credit scores. Of course, when I say that this isn't as rare as we imagine, I'm not attempting to normalize such practices (and we should push back to understand the real value something like this is offering to our product).
What I suggest though is to model this explicitly and not try to generalize this. This feature probably is driven by some business need, and you should model it as such. By this I mean that you should treat the service serving the reads as a separate service entirely, which can raise it's own events for things that have happened, and design the rest of the system in a reactive way (ie responding to events generated by another BC/service).
As an example, you could have the service which serves the query fire a BalanceChecked event, which either the same service or another one could store in a stream for subsequent processing.
I would not suggest a command, because if you'll be replying with the data it's not as if someone can reject the command; it has already happened, someone already has the data.

CQRS Read models in a NoSql (Mongo DB)

Hi its my fist time with DDD/CQRS. I've read multiple sources of knowledge and Im still confused a bit, maybe someone could help :)
Lets assume simple case that we have products and clients (possibly different bounded contexts).
A client can buy a product and he wants to see all products that he purchased.
In this case I realize I need a UserPurchasesView view model with:
purchaseId (which is a mongo primary key)
userId,
product: {id, name, image, shortDescription, [maybe some others]}
prize
timestamp
Now ... the problem is that My domain is producing an event like UserPurchasedProduct(userId, productId). I could enrich an event with a prize, product name or maybe something else but not all fields. Im getting to a point where enriching seems to be wrong.
In this point I realize I need something like ProductDetailsView:
productId (primary key)
prize
name
shortDescription
logo
This view is maintained by events like: ProductCreated, ProductRenamed, ProductImageChanged
And now we have 2 options ...
Look into the ProductDetailsView when UserPurchasedProduct event comes in, take all needed product details and save it in UserPurchasesView for faster reads. This solution looks not that bad but it introduces some extra coupling and it seems to me these views cannot be scaled well when needed. Also both views must be rebuilt together when replying all events from the event store (rebuilding is also more tricky in that case).
Keep only the productId in the UserPurchasesView and read multiple views when user queries his purchases. This is some extra processing that would have to be done somewhere. In the frontend, in the backend controller or in some read model high level API. UPDATE: I also realized that I would also need to keep at least the prize and maybe name of the product in the UserPurchasesView (in case it changes) but sometimes you need the value from the time of a purchase and sometimes you need the recent value. Scenario depends on a business but we could imagine both.
None of these solutions looks perfect to me. Am I wrong, am I missing something or is it just the way to do it? Thanks!
You understand well.
So you have to choose between coupling between the read models and coupling between UI and individual read models.
One of the main advantages of CQRS/ES is the posibility to create blazing fast read models (views if you like), without any joins, the perfect cache as I saw it called. I personally have chosen every time the first approach, with full data denormalisation. The views are very fast and models very clean and clear. This is the perfect solution if you want to optimize the read side of your application (and I think you should).
By listening to the right events you can keep these read models in sync with the rest of the application.
There is a 3rd option:
The projection responsible for the UserPurchasesView view not only listens to UserPurchasedProduct events, but also to ProductCreated, ProductRenamed, ProductImageChanged - any product related events that affect the UserPurchasesView. Now, as well as the UserPurchasesView collection for the read model that it is responsible for, it also needs a private collection to maintain the bits of products it is interested in: ({id, name, image, shortDescription, [maybe some others]}), so that when a new purchase event comes in, you have somewhere to get the initial state of those product fields from. Since your UserPurchasesView needs to listen to some of those product events anyway in order to keep up to date when a product changes, this isn't really much extra work, and avoids any dependency on another projection (ProductDetailsView). The cross-projection dependency also has a potential problem due to eventual consistency - what if the product isn't even in the product details view yet when the UserPurchasedProduct event comes through?
To avoid any concurrency issues, it's simplest to have each projection managed only by a single process and a single thread. That way, as long as the projection can receive events in-order across streams (so that it is guaranteed to see the product creation before the product purchase), you won't have issues with seeing a purchase before the product exists. If you introduce sharding or any other multi-threading to your projection, it gets more complicated.

DDD (Domain-Driven-Design) - large aggregates

I'm currently studying Eric Evans'es Domain-Driven-Design. The idea of aggregates is clear to me and I find it very interesting. Now I'm thinking of an example of aggregate like :
BankAccount (1) ----> (*) Transaction.
BankAccount
BigDecimal calculateTurnover();
BankAccount is an aggregate. To calculate turnover I should traverse all transactions and sum up all amounts. Evans assumes that I should use repositories to only load aggreagates. In the above case there could be a few tousands of transactions which I don't want load at once in memory.
In the context of the repository pattern, aggregate roots are the only objects > your client code loads from the repository.
The repository encapsulates access to child objects - from a caller's perspective it automatically loads them, either at the same time the root is loaded or when they're actually needed (as with lazy loading).
What would be your suggestion to implement calulcateTurnover in a DDD aggregate ?
As you have pointed out, to load 1000s of entities in an aggregate is not a scalable solution. Not only will you run into performance problems but you will likely also experience concurrency issues, as emphasised by Vaughn Vernon in his Effective Aggregate Design series.
Do you want every transaction to be available in the BankAccount aggregate or are you only concerned with turnover?
If it is only the turnover that you need, then you should establish this value when instantiating your BankAccount aggregate. This could likely be effectively calculated by your data store technology (indexed JOINs, for example, if you are using SQL). Perhaps you also need to consider having this this as a precalculated value in your data store (what happens when you start dealing with millions of transactions per bank account)?
But perhaps you still require the transactions available in your domain? Then you should consider having a separate Transaction repository.
I would highly recommend reading Vaughn Vernon's series on aggregate design, as linked above.
You have managed to pick a very interesting example :)
I actually use Account1->*Transaction when explaining event sourcing (ES) to anyone not familiar with it.
As a developer I was taught (way back) to use what we can now refer to as entity interaction. So we have a Customer record and it has a current state. We change the state of the record in some way (address, tax details, discount, etc.) and store the result. We never quite know what happened but we have the latest state and, since that is the current state of our business, it is just fine. Of course one of the first issues we needed to deal with was concurrency but we had ways of handling that and even though not fantastic it "worked".
For some reason the accounting discipline didn't quite buy into this. Why do we not simply have the latest state of an Account. We will load the related record, change the balance, and save the state. Oddly enough most people would probably cringe at the thought yet it seems to be OK for the rest of our data.
The accounting domain got around this by registering the change events as a series of Transaction entries. So should you lose you account record and the latest balance you can always run though all the transactions to obtain the latest balance. That is event sourcing.
In ES one typically loads an entire list of events for an aggregate root (AR) to obtain its latest state. There is also, typically, a mechanism to deal with a huge number of events when loading all would cause performance issues: snapshots. Usually only the latest snapshot is stored. The snapshot contains the full latest state of the aggregate and only event after the snapshot version are applied.
One of the huge advantages of ES is that one could come up with new queries and then simply apply all the events to the query handler and determine the outcome. Perhaps something like: "How many customer do I have that have moved twice in the last year". Quite arbitrary but using the "traditional" approach the answer would quite likely be that we'll start gathering that information from today and have it available next year as we have not been saving the CustomerMoved events. With ES we can search for the CustomerMoved events and get a result at any point.
So this brings me back to your example. You probably do not want to be loading all the transactions. Instead store the "Turnover" and calculate it on the go. Should the "Turnover" be a new requirement then a once off processing of all the ARs should get it up to speed. You can still have a calculateTurnover() method somewhere but that would be something you wouldn't run all too often. And in those cases you would need to load all the transactions for an AR.

CQRS design: nosql data view

This is a "language agnostic" question.
I started to study the CQRS pattern.
I've a simple question. I'm supposing to have 2 different storage layer: one relational for the commands(Mysql etc..) and one NoSql (mongo,cassandra.. etc) for the "query"?
Let me explain a little example:
1) As a user I want to insert a "Todo task"
Command: "Create Task" and will insert a new task into a database which have the User and the Todo tables.
2) As a user I'm able to see a list of created task
Query: "GetTasks" that will return a "view" with a collection of task taken from a non sql table named "UserTasks" which have a user and a list of created task.
Is the right approach? I'm sorry if the language is poor, it's just a little example.
If it seems a good approach (again, don't consider details) what is the best approach to keep updated the data stores?
I'm thinking to raise an event like "TaskCreated" and take the new task and insert those information in the nosql storage.
Thanks!
I can't really understand what you're looking for. but... typically, a command would be something that results in side effects. Queries don't cause side effects. GetTasks wouldn't really be a command, but a query.
Your "CreateTask" would be a command, which would result in the task added to the relevant data store(s). Your GetTasks query would retrieve that information from a datastore. It doesn't really matter if you're using a SQL or NoSQL store for this.
The "CommandStore" is typically the store that has just enough data to enforce invariants. In your case, what data is required for that? Is some information required to decide whether or not a task can be registered? For example, say, you have a requirement that a user can have at most 3 "todo"s. In this case, a table in the "Command Store" storing (UserId, Todo Count) is enough. You could also use (UserId, [TodoId]) - ie. store a list of todo ids so that you can gain idempotence. All other information about the user and tasks would be query data, and would be in the query store.
Hope that makes sense.
While there are times when you may wish to store commands, you generally don't. Rather a popular approach is to store the domain events that occur as a result of the commands.This is referred to as Event Sourcing. This would make 'STOREA' a store of events or to put it another way, an event stream. 'STOREB' is typically referred to as the Read Model. It has a de-normalised structure optimised for read speed. It is kept up to date via de-normalisers which respond to specific events. A key point to note here is that there is often a lag between the event being raised and the read model being updated. This in my opinion is a good thing but needs to be thought about when designing the UI.
For more info take a look at CQRS – A Step-by-Step Guide to the Flow of a typical Application
I hope that helps

Paging among multiple aggregate root

I'm new to DDD so please executes me if some term/understanding are bit off. But please correct me and any advice are appreciated.
Let's say I'm doing a social job board site, and I've identified my aggregate roots: Candidates, Jobs, and Companies. Very different things/contexts so each has own database table, repository, and service. But now I have to build a Pinterest style homepage where data blocks show data for either a Candidate, a Job, or a Company.
Now the tricky part is the data blocks have to be ordered by the last time something happened to the aggregate it represents (a company is liked/commented, or a job was update, etc), and paging occurs in form of infinite scrolling, again just like Pinterest. Since things occur to these aggregates independently I do not have a way to know how many of what aggregate is on any particular page. (but if I did btw, say a table that tracks aggregates' last update time, have I no choice but to promote this to be another aggregate root, with it's own repository?)
Where would I implement the paging logic? I read somewhere that there should be one service per repository per aggregate root, so should I sort and page in controller (I'm using MVC by the way)? Or should there be a independent Application Service that does cross boundary stuff like this? Either case I have to fetch ALL entities for ALL aggregates from db?
That's too many questions already but I'm basically asking:
Is paging presentation, business, or persistence logic? Which horizontal layer?
Where should cross boundary code reside in DDD? Which vertical stack?
Several things come to mind.
How fresh does this aggregated data need to be? I doubt realtime is going to add much value. Talk to a business person and bargain for some latency. This will allow you to build a simpler solution to the problem.
Why not have some process do the scanning, aggregation, sorting and store the result of that asynchronously? Doesn't even need to be in a database (Redis). The bargained latency could be the interval at which to run your process.
Paging is hardly a business decision concern in your example. You just need to provide infinite scrolling and some ajax calls that fetch the cached, aggregated, sorted information. This has little to do with DDD.
Your UI artifacts and the aggregation, sorting process seem to be very much a thing on their own, working together with the data or - better yet - a datacomponent of each context that provides the data in the desired format.

Resources