DDD (Domain-Driven-Design) - large aggregates - domain-driven-design

I'm currently studying Eric Evans'es Domain-Driven-Design. The idea of aggregates is clear to me and I find it very interesting. Now I'm thinking of an example of aggregate like :
BankAccount (1) ----> (*) Transaction.
BankAccount
BigDecimal calculateTurnover();
BankAccount is an aggregate. To calculate turnover I should traverse all transactions and sum up all amounts. Evans assumes that I should use repositories to only load aggreagates. In the above case there could be a few tousands of transactions which I don't want load at once in memory.
In the context of the repository pattern, aggregate roots are the only objects > your client code loads from the repository.
The repository encapsulates access to child objects - from a caller's perspective it automatically loads them, either at the same time the root is loaded or when they're actually needed (as with lazy loading).
What would be your suggestion to implement calulcateTurnover in a DDD aggregate ?

As you have pointed out, to load 1000s of entities in an aggregate is not a scalable solution. Not only will you run into performance problems but you will likely also experience concurrency issues, as emphasised by Vaughn Vernon in his Effective Aggregate Design series.
Do you want every transaction to be available in the BankAccount aggregate or are you only concerned with turnover?
If it is only the turnover that you need, then you should establish this value when instantiating your BankAccount aggregate. This could likely be effectively calculated by your data store technology (indexed JOINs, for example, if you are using SQL). Perhaps you also need to consider having this this as a precalculated value in your data store (what happens when you start dealing with millions of transactions per bank account)?
But perhaps you still require the transactions available in your domain? Then you should consider having a separate Transaction repository.
I would highly recommend reading Vaughn Vernon's series on aggregate design, as linked above.

You have managed to pick a very interesting example :)
I actually use Account1->*Transaction when explaining event sourcing (ES) to anyone not familiar with it.
As a developer I was taught (way back) to use what we can now refer to as entity interaction. So we have a Customer record and it has a current state. We change the state of the record in some way (address, tax details, discount, etc.) and store the result. We never quite know what happened but we have the latest state and, since that is the current state of our business, it is just fine. Of course one of the first issues we needed to deal with was concurrency but we had ways of handling that and even though not fantastic it "worked".
For some reason the accounting discipline didn't quite buy into this. Why do we not simply have the latest state of an Account. We will load the related record, change the balance, and save the state. Oddly enough most people would probably cringe at the thought yet it seems to be OK for the rest of our data.
The accounting domain got around this by registering the change events as a series of Transaction entries. So should you lose you account record and the latest balance you can always run though all the transactions to obtain the latest balance. That is event sourcing.
In ES one typically loads an entire list of events for an aggregate root (AR) to obtain its latest state. There is also, typically, a mechanism to deal with a huge number of events when loading all would cause performance issues: snapshots. Usually only the latest snapshot is stored. The snapshot contains the full latest state of the aggregate and only event after the snapshot version are applied.
One of the huge advantages of ES is that one could come up with new queries and then simply apply all the events to the query handler and determine the outcome. Perhaps something like: "How many customer do I have that have moved twice in the last year". Quite arbitrary but using the "traditional" approach the answer would quite likely be that we'll start gathering that information from today and have it available next year as we have not been saving the CustomerMoved events. With ES we can search for the CustomerMoved events and get a result at any point.
So this brings me back to your example. You probably do not want to be loading all the transactions. Instead store the "Turnover" and calculate it on the go. Should the "Turnover" be a new requirement then a once off processing of all the ARs should get it up to speed. You can still have a calculateTurnover() method somewhere but that would be something you wouldn't run all too often. And in those cases you would need to load all the transactions for an AR.

Related

CQRS to command or not to, that is the question

I am new to CQRS, but can see the value in this, so I am trying to apply this to a financial system that we are busy rebuilding.
Like I mentioned, this is a basic fin system with basic balance, withdraw, deposit like functionality.
I have a withdraw & deposit commands. But I am struggling with balance.
According to the domain experts, they want to handle balance as a transaction, with no financial implication (yet), on the clients behalf. So, when the client does a balance inq via the device, it creates a transaction, but also a balance query at the same time.
In the CQRS world, you distiguish between commands that mutate state & queries, that retrieve data in some way.
Apologies if my understanding here are flawed. Can someone point me in the correct direction?
EDIT:
Maybe let me put it this way. I was thinking of creating a CheckBalanceCommand that creates a transaction & insert a BalanceCheckedEvent into the store. But then I would also need to create a CheckBalanceQuery to retrieve the actual balance from the read db.
I would need to invoke both in order to satisfy the balance request.
This is an interesting issue. Your business case is valid: some commands don't mutate aggregate/entity states, still treating them and their resultant events are important (e.g. for audit trails).
In order to support these cases, I'd introduce a base event type named IdentityEvent (inspired by identity values for various mathematical operators and as a justification for the concept; operating them on a certain value doesn't change it). On issuing the corresponding command, derivatives of this event (e.g. BalanceCheckedEvent in your case) will be appended to the aggregate's event stream and view projection may construct views from them as usual; however, their mutate method will not perform any actual mutation while reconstructing entities from event stream.
The actual command processing takes place at the domain layer. Some of your application service, at the application layer, receives the query request, processes it as usual. Additionally, before or after the query operation, the same application service may issue the command to the domain layer, on the aggregate root itself. That doesn't violate any principle: your read and query model are still separate, application service just coordinating between the two.
This is not as rare as you would imagine. An additional valid business case is when a service provider runs a credit check on someone. Credit reporting companies actually store queries made against ones credit score, and use it to influence future credit scores. Of course, when I say that this isn't as rare as we imagine, I'm not attempting to normalize such practices (and we should push back to understand the real value something like this is offering to our product).
What I suggest though is to model this explicitly and not try to generalize this. This feature probably is driven by some business need, and you should model it as such. By this I mean that you should treat the service serving the reads as a separate service entirely, which can raise it's own events for things that have happened, and design the rest of the system in a reactive way (ie responding to events generated by another BC/service).
As an example, you could have the service which serves the query fire a BalanceChecked event, which either the same service or another one could store in a stream for subsequent processing.
I would not suggest a command, because if you'll be replying with the data it's not as if someone can reject the command; it has already happened, someone already has the data.

CQRS Read models in a NoSql (Mongo DB)

Hi its my fist time with DDD/CQRS. I've read multiple sources of knowledge and Im still confused a bit, maybe someone could help :)
Lets assume simple case that we have products and clients (possibly different bounded contexts).
A client can buy a product and he wants to see all products that he purchased.
In this case I realize I need a UserPurchasesView view model with:
purchaseId (which is a mongo primary key)
userId,
product: {id, name, image, shortDescription, [maybe some others]}
prize
timestamp
Now ... the problem is that My domain is producing an event like UserPurchasedProduct(userId, productId). I could enrich an event with a prize, product name or maybe something else but not all fields. Im getting to a point where enriching seems to be wrong.
In this point I realize I need something like ProductDetailsView:
productId (primary key)
prize
name
shortDescription
logo
This view is maintained by events like: ProductCreated, ProductRenamed, ProductImageChanged
And now we have 2 options ...
Look into the ProductDetailsView when UserPurchasedProduct event comes in, take all needed product details and save it in UserPurchasesView for faster reads. This solution looks not that bad but it introduces some extra coupling and it seems to me these views cannot be scaled well when needed. Also both views must be rebuilt together when replying all events from the event store (rebuilding is also more tricky in that case).
Keep only the productId in the UserPurchasesView and read multiple views when user queries his purchases. This is some extra processing that would have to be done somewhere. In the frontend, in the backend controller or in some read model high level API. UPDATE: I also realized that I would also need to keep at least the prize and maybe name of the product in the UserPurchasesView (in case it changes) but sometimes you need the value from the time of a purchase and sometimes you need the recent value. Scenario depends on a business but we could imagine both.
None of these solutions looks perfect to me. Am I wrong, am I missing something or is it just the way to do it? Thanks!
You understand well.
So you have to choose between coupling between the read models and coupling between UI and individual read models.
One of the main advantages of CQRS/ES is the posibility to create blazing fast read models (views if you like), without any joins, the perfect cache as I saw it called. I personally have chosen every time the first approach, with full data denormalisation. The views are very fast and models very clean and clear. This is the perfect solution if you want to optimize the read side of your application (and I think you should).
By listening to the right events you can keep these read models in sync with the rest of the application.
There is a 3rd option:
The projection responsible for the UserPurchasesView view not only listens to UserPurchasedProduct events, but also to ProductCreated, ProductRenamed, ProductImageChanged - any product related events that affect the UserPurchasesView. Now, as well as the UserPurchasesView collection for the read model that it is responsible for, it also needs a private collection to maintain the bits of products it is interested in: ({id, name, image, shortDescription, [maybe some others]}), so that when a new purchase event comes in, you have somewhere to get the initial state of those product fields from. Since your UserPurchasesView needs to listen to some of those product events anyway in order to keep up to date when a product changes, this isn't really much extra work, and avoids any dependency on another projection (ProductDetailsView). The cross-projection dependency also has a potential problem due to eventual consistency - what if the product isn't even in the product details view yet when the UserPurchasedProduct event comes through?
To avoid any concurrency issues, it's simplest to have each projection managed only by a single process and a single thread. That way, as long as the projection can receive events in-order across streams (so that it is guaranteed to see the product creation before the product purchase), you won't have issues with seeing a purchase before the product exists. If you introduce sharding or any other multi-threading to your projection, it gets more complicated.

DDD/CQRS Querying Events

I was looking at post's on querying in application designed with approach Event Sourcing/DDD/CQRS.
As I understand events are changes to the state of a domain object. The changes to state will be maintained as history/events in DB(any of sql/no sql).
If user wants to query to get current state for a particular aggregate root, it will involve fetching history of events.
When user will query especially business specific queries he/she will be interested in current state not the history of events.
How querying or 'Q' part in CQRS works with event sourcing?
Consider I have a domain object "Account" as aggregate root. The account AR will go through lots of changes i.e. credits debits. event store will have credit and debit events.
Consider user is required to get current balance of an account, how stream of history of events will suite here? How will user fetch current balance for given account?
I am unable to understand, How for business specific querying history of events will be useful?
-Prakhyat M M
I would recommend you to read more articles from Greg Young (He is like the father of CQRS and Event Sourcing), like this: CQRS, Task Based UIs, Event Sourcing... agh.
Sorry for my bad English, I am from Paraguay. But I really like DDD - CQRS - ES and I would like to try to make a point.
The use of "Projections" (also known as Materialized Views) and the concept of "Eventual Consistency" are the fundamentals that every practitioner of CQRS should understand very well. The Event Store is for query. Is in the Command side of CQRS, not the in the Query side. You may use a bus to send the events stored in the Event Store to the query side in order to process and generate a read model, or view models, from which you can query. In any case a eventstore per se is a query model.
Looks like you are a Java guy, but, still, you may want to check the CQRS Journey from Microsoft.
Hope this helps a little bit and motivates you to do more research on DDD / CQRS / ES, the New Trio of Line of Business Applications.
You'll use a projection of the event stream into the read model, that contains exactly those information that the Query-side (Q) needs. For example, you could have an "account balance" projection that follows all events that change the account balance, but possibly ignores other events in the account's stream (such as owner changes). The projection then saves that info in a way that it can be queried very quickly, e.g., in memory or in a small read-model database table (accountId, balance) with the accountId as the key (database can be a key-value store, for example).
I suggest further reading on the CQRS concept such as this one or this one.
Interesting enough, recently more people discover using event store as the read model, leaving projections and "proper" read models until absolutely necessary.
We all know that dealing with projections increases the complexity. At minimum you have to create new models, establish the DAL for the read model and create projections to translate event to the read model changes, and bind those projections to the stream of events from your store. It requires more code, more moving parts and some of them are not easy to test. Schema changes at the read side also require migrations.
It appears that for many scenarios reading all events (properly partitioned) might be enough to have your "read model". It takes not much time until the system really grows large so you need to read tens of thousands of events to create one UI screen. But before you reach this point, you can just read events. May be use the file system to store events although tools like EventStore are free and quite easy to use. May be add some indexing.
This approach let you stabilise the domain significantly, you get more knowledge about how the system works, tune the events and be really prepared to bring the "proper" read model into the system, but you might not have to.
Adam Dymitruk has wrote a blog post about it, you might find it worth reading even if you don't want to take this approach. Greg Young also gave a talk EventStore as read model back in 2012.

Paging among multiple aggregate root

I'm new to DDD so please executes me if some term/understanding are bit off. But please correct me and any advice are appreciated.
Let's say I'm doing a social job board site, and I've identified my aggregate roots: Candidates, Jobs, and Companies. Very different things/contexts so each has own database table, repository, and service. But now I have to build a Pinterest style homepage where data blocks show data for either a Candidate, a Job, or a Company.
Now the tricky part is the data blocks have to be ordered by the last time something happened to the aggregate it represents (a company is liked/commented, or a job was update, etc), and paging occurs in form of infinite scrolling, again just like Pinterest. Since things occur to these aggregates independently I do not have a way to know how many of what aggregate is on any particular page. (but if I did btw, say a table that tracks aggregates' last update time, have I no choice but to promote this to be another aggregate root, with it's own repository?)
Where would I implement the paging logic? I read somewhere that there should be one service per repository per aggregate root, so should I sort and page in controller (I'm using MVC by the way)? Or should there be a independent Application Service that does cross boundary stuff like this? Either case I have to fetch ALL entities for ALL aggregates from db?
That's too many questions already but I'm basically asking:
Is paging presentation, business, or persistence logic? Which horizontal layer?
Where should cross boundary code reside in DDD? Which vertical stack?
Several things come to mind.
How fresh does this aggregated data need to be? I doubt realtime is going to add much value. Talk to a business person and bargain for some latency. This will allow you to build a simpler solution to the problem.
Why not have some process do the scanning, aggregation, sorting and store the result of that asynchronously? Doesn't even need to be in a database (Redis). The bargained latency could be the interval at which to run your process.
Paging is hardly a business decision concern in your example. You just need to provide infinite scrolling and some ajax calls that fetch the cached, aggregated, sorted information. This has little to do with DDD.
Your UI artifacts and the aggregation, sorting process seem to be very much a thing on their own, working together with the data or - better yet - a datacomponent of each context that provides the data in the desired format.

Azure Table Storage Design for Web Application

I am evaluating the use of Azure Table Storage for an application I am building, and I would like to get some advice on...
whether or not this is a good idea for the application, or
if I should stick with SQL, and
if I do go with ATS, what would be a good approach to the design of the storage.
The application is a task-management web application, targeted to individual users. It is really a very simple application. It has the following entities...
Account (each user has an account.)
Task (users create tasks, obviously.)
TaskList (users can organize their tasks into lists.)
Folder (users can organize their lists into folders.)
Tag (users can assign tags to tasks.)
There are a few features / requirements that we will also be building which I need to account for...
We eventually will provide features for different accounts to share lists with each other.
Users need to be able to filter their tasks in a variety of ways. For example...
Tasks for a specific list
Tasks for a specific list which are tagged with "A" and "B"
Tasks that are due tomorrow.
Tasks that are tagged "A" across all lists.
Tasks that I have shared.
Tasks that contain "hello" in the note for the task.
Etc.
Our application is AJAX-heavy with updates occurring for very small changes to a task. So, there is a lot of small requests and updates going on. For example...
Inline editing
Click to complete
Change due date
Etc...
Because of the heavy CRUD work, and the fact that we really have a list of simple entities, it would be feasible to go with ATS. But, I am concerned about the transaction cost for updates, and also whether or not the querying / filtering I described could be supported effectively.
We imagine numbers starting small (~hundreds of accounts, ~hundreds or thousands of tasks per account), but we obviously hope to grow our accounts.
If we do go with ATS, would it be better to have...
One table per entity (Accounts, Tasks, TaskLists, etc.)
Sets of tables per customer (JohnDoe_Tasks, JohnDoe_TaskLists, etc.)
Other thoughts?
I know this is a long post, but if anyone has any thoughts or ideas on the direction, I would greatly appreciate it!
Azure Table Storage is well suited to a task application. As long as you setup your partition keys and row keys well, you can expect fast and consistent performance with a huge number of simultaneous users.
For task sharing, ATS provides optimistic concurrency to support multiple users accessing the same data in parallel. You can use optimistic concurrency to warn users when more than one account is editing the same data at the same time, and prevent them from accidentally overwriting each-other's changes.
As to the costs, you can estimate your transaction costs based on the number of accounts, and how active you expect those accounts to be. So, if you expect 300 accounts, and each account makes 100 edits a day, you'll have 30K transactions a day, which (at $.01 per 10K transactions) will cost about $.03 a day, or a little less than $1 a month. Even if this estimate is off by 10X, the transaction cost per month is still less than a hamburger at a decent restaurant.
For the design, the main aspect to think about is how to key your tables. Before designing your application for ATS, I'd recommend reading the ATS white paper, particularly the section on partitioning. One reasonable design for the application would be to use one table per entity type (Accounts, Tasks, etc), then partition by the account name, and use some unique feature of the tasks for the row key. For both key types, be sure to consider the implications on future queries. For example, by grouping entities that are likely to be updated together into the same partition, you can use Entity Group Transactions to update up to 100 entities in a single transaction -- this not only increases speed, but saves on transaction costs as well. For another implication of your keys, if users will tend to be looking at a single folder at a time, you could use the row key to store the folder (e.g. rowkey="folder;unique task id"), and have very efficient queries on a folder at a time.
Overall, ATS will support your task application well, and allow it to scale to a huge number of users. I think the main question is, do you need cloud magnitude of scaling? If you do, ATS is a great solution; if you don't, you may find that adjusting to a new paradigm costs more time in design and implementation than the benefits you receive.
What your are asking is a rather big question, so forgive me if I don't give you an exact answer.. The short answer would be: Sure, go ahead with ATS :)
Your biggest concern in this scenario would be about speed. As you've pointed out, you are expecting a lot of CRUD operations. Out of the box, ATS doesn't support tranactions, but you can architect yourself out of such a challenge by using the CQRS structure.
The big difference from using a SQL to ATS is your lack of relations and general query possibilities, since ATS is a "NoSQL" approach. This means you have to structure your tables in a way that supports your query operations, which is not a simple task..
If you are aware of this, I don't see any trouble doing what your'e describing.
Would love to see the end result!

Resources