Is it a good idea to rely on a given aggregate's history with Event Sourcing?

Is it a good idea to rely on a given aggregate's history with Event Sourcing? - domain-driven-design

I'm currently dealing with a situation in which I need to make a decision based on whether it's the first time my aggregate got into a situation (an Order was bought).
I can solve this problem in two ways:
Introduce in my aggregate a field stating whether an order has ever been bought (or maybe the number of bought orders);
Look up in the aggregate's history for any OrderWasBought event.
Is option 2 ever acceptable? For some reason I think option 1) is for the general case safer / cleaner but I lack experience in these matters.
Thanks

IMHO both effectively do the same thing: The field stating that an order was bought needs to be hydrated somehow. Basically this would be done as part of the replay, which basically does not mean anything but that when an OrderWasBought event happened, the field will be set.
So, it does not make any difference if you look at the field, or if you look for the existence of the event. At least it does not make a difference, when it is about the effective result.
Talking about efficiency, it may be the better idea to use a field, since this way the field gets hydrated as part of the replay, which needs to be run anyway. So, you don't have to search the list of events again, but you can simply look at the (cached) value in the field.
So, in the end, to cut a long story short: It doesn't matter. Use what feels better to you. If the history of an aggregate gets lengthy, you may be better off using the field approach in terms of performance.
PS: Of course, this depends on the implementation of how aggregates are being loaded – is the aggregate able to access its own event history at all? If not, setting a field while the aggregate is being replayed is your only option, anyway. Please note that the aggregate does not (and should not!) have access to the underlying repository, so it can not load its history on its own.

Option 2 is valid as long as the use case doesn't need the previous state of the aggregate. Replaying events only restores a readonly state, if the current command doesn't care about it, searching for a certain event may be a valid simple solution.
If you feat "breaking encapsulation" this concern may not apply. Event sourcing and aggregate are concepts mainly they don't impose a certain OO approach. The Event Store contains the business state expressed as a stream of events. You can read it and use it as an immutable collection any time. I would replay events only if I'd need a certain complex state restored. But in your case here, the simpler 'has event' solution encapsulated as a service should work very well.
That being said, there's nothing wrong with always replaying events to restore state and have that field. It's a matter of style mostly, choose between a consistent style of doing things or adapt it to go for the simplest solution for a given case.

Related

CQRS to command or not to, that is the question

I am new to CQRS, but can see the value in this, so I am trying to apply this to a financial system that we are busy rebuilding.
Like I mentioned, this is a basic fin system with basic balance, withdraw, deposit like functionality.
I have a withdraw & deposit commands. But I am struggling with balance.
According to the domain experts, they want to handle balance as a transaction, with no financial implication (yet), on the clients behalf. So, when the client does a balance inq via the device, it creates a transaction, but also a balance query at the same time.
In the CQRS world, you distiguish between commands that mutate state & queries, that retrieve data in some way.
Apologies if my understanding here are flawed. Can someone point me in the correct direction?
EDIT:
Maybe let me put it this way. I was thinking of creating a CheckBalanceCommand that creates a transaction & insert a BalanceCheckedEvent into the store. But then I would also need to create a CheckBalanceQuery to retrieve the actual balance from the read db.
I would need to invoke both in order to satisfy the balance request.

This is an interesting issue. Your business case is valid: some commands don't mutate aggregate/entity states, still treating them and their resultant events are important (e.g. for audit trails).
In order to support these cases, I'd introduce a base event type named IdentityEvent (inspired by identity values for various mathematical operators and as a justification for the concept; operating them on a certain value doesn't change it). On issuing the corresponding command, derivatives of this event (e.g. BalanceCheckedEvent in your case) will be appended to the aggregate's event stream and view projection may construct views from them as usual; however, their mutate method will not perform any actual mutation while reconstructing entities from event stream.
The actual command processing takes place at the domain layer. Some of your application service, at the application layer, receives the query request, processes it as usual. Additionally, before or after the query operation, the same application service may issue the command to the domain layer, on the aggregate root itself. That doesn't violate any principle: your read and query model are still separate, application service just coordinating between the two.

This is not as rare as you would imagine. An additional valid business case is when a service provider runs a credit check on someone. Credit reporting companies actually store queries made against ones credit score, and use it to influence future credit scores. Of course, when I say that this isn't as rare as we imagine, I'm not attempting to normalize such practices (and we should push back to understand the real value something like this is offering to our product).
What I suggest though is to model this explicitly and not try to generalize this. This feature probably is driven by some business need, and you should model it as such. By this I mean that you should treat the service serving the reads as a separate service entirely, which can raise it's own events for things that have happened, and design the rest of the system in a reactive way (ie responding to events generated by another BC/service).
As an example, you could have the service which serves the query fire a BalanceChecked event, which either the same service or another one could store in a stream for subsequent processing.
I would not suggest a command, because if you'll be replying with the data it's not as if someone can reject the command; it has already happened, someone already has the data.

CQRS/Event Sourcing - Does one expect to receive an Aggregate Id from the user/request?

I am currently just trying to learn some new programming patterns and I decided to give event sourcing a shot.
I have decided to model a warehouse as my aggregate root in the domain of shipping/inventory where the number of warehouses is generally pretty constant (i.e. a company wont be adding warehouses too often).
I have run into the question of how to set my aggregateId, which should correspond to a warehouse, on my server. Most examples I have seen, including this one, show the aggregate ID being generated server side when a new aggregate is being created (in my case a warehouse), and then passed in the command request when referring to that aggregate for subsequent commands.
Would you say this is the correct approach? Can I expect the user to know and pass aggregate Ids when issuing commands? I realize this is probably domain dependent and could also be a UI/UX choice as well, just wondering what other's have done. It would make more sense to me if the number of my event sourced aggregates were more frequent, such as with meal tabs or shopping carts.
Thanks!

Heuristic: aggregate id, in many cases, is analogous to the primary key used to distinguish entities in a database table. Many of the lessons of natural vs surrogate keys apply.
Can I expect the user to know and pass aggregate Ids when issuing commands?
You probably can't depend on the human to know the aggregate ids. But the client that the human operator is using can very well know them.
For instance, if an operator is going to be working in a single warehouse during a session, then we might look up the appropriate identifier, cache it, and use it when constructing messages on behalf of the user.
Analog: when you fill in a web form and submit it, the browser does the work of looking at the form action and using that information to construct the correct URI, and similarly the correct HTTP Request.
The client will normally know what the ID is, because it just got it during a previous query.
Creation patterns are weird. It can, in some circumstances, make sense for the client to choose the identifier to be used when creating a new aggregate. In others, it makes sense for the client to provide an identifier for the command message, and the server decides for itself what the aggregate identifier should be.
It's messaging, so you want to be careful about coupling the client directly to your internal implementation details -- especially if that client is under a different development schedule. If you get the message contract right, then the server and client can evolve in any way consistent with the contract at any time.
You may want to review Greg Young's 10 year retrospective, which includes a discussion of warehouse systems. TL;DR - in many cases the messages coming from the human operators are events, not commands.

Would you say this is the correct approach?
You're asking if one of Greg Young's Event Sourcing samples represents the correct approach... Given that the combination of CQRS and Event Sourcing was essentially (re)invented by Greg, I'd say there's a pretty good chance of that.
In general, letting the code that implements the Command-side generate a GUID for every Command, Event, or other persistent object that it needs to write is by far the simplest implementation, since GUIDs are guaranteed to be unique. In a distributed system, uniqueness without coordination is a big thing.
Can I expect the user to know and pass aggregate Ids when issuing commands?
No, and you particularly can't expect a user to know the GUID of their assets. What you may be able to do is to present the user with a list of his or her assets. Each item in the list will have the GUID associated, but it may not be necessary to surface that ID in the user interface. It's just data that the underlying UI object carries around internally.
In some cases, users do need to know the ID of some of their assets (e.g. if it involves phone support). In that case, you can add a lookup API to address that concern.

DDD (Domain-Driven-Design) - large aggregates

I'm currently studying Eric Evans'es Domain-Driven-Design. The idea of aggregates is clear to me and I find it very interesting. Now I'm thinking of an example of aggregate like :
BankAccount (1) ----> (*) Transaction.
BankAccount
BigDecimal calculateTurnover();
BankAccount is an aggregate. To calculate turnover I should traverse all transactions and sum up all amounts. Evans assumes that I should use repositories to only load aggreagates. In the above case there could be a few tousands of transactions which I don't want load at once in memory.
In the context of the repository pattern, aggregate roots are the only objects > your client code loads from the repository.
The repository encapsulates access to child objects - from a caller's perspective it automatically loads them, either at the same time the root is loaded or when they're actually needed (as with lazy loading).
What would be your suggestion to implement calulcateTurnover in a DDD aggregate ?

As you have pointed out, to load 1000s of entities in an aggregate is not a scalable solution. Not only will you run into performance problems but you will likely also experience concurrency issues, as emphasised by Vaughn Vernon in his Effective Aggregate Design series.
Do you want every transaction to be available in the BankAccount aggregate or are you only concerned with turnover?
If it is only the turnover that you need, then you should establish this value when instantiating your BankAccount aggregate. This could likely be effectively calculated by your data store technology (indexed JOINs, for example, if you are using SQL). Perhaps you also need to consider having this this as a precalculated value in your data store (what happens when you start dealing with millions of transactions per bank account)?
But perhaps you still require the transactions available in your domain? Then you should consider having a separate Transaction repository.
I would highly recommend reading Vaughn Vernon's series on aggregate design, as linked above.

You have managed to pick a very interesting example :)
I actually use Account1->*Transaction when explaining event sourcing (ES) to anyone not familiar with it.
As a developer I was taught (way back) to use what we can now refer to as entity interaction. So we have a Customer record and it has a current state. We change the state of the record in some way (address, tax details, discount, etc.) and store the result. We never quite know what happened but we have the latest state and, since that is the current state of our business, it is just fine. Of course one of the first issues we needed to deal with was concurrency but we had ways of handling that and even though not fantastic it "worked".
For some reason the accounting discipline didn't quite buy into this. Why do we not simply have the latest state of an Account. We will load the related record, change the balance, and save the state. Oddly enough most people would probably cringe at the thought yet it seems to be OK for the rest of our data.
The accounting domain got around this by registering the change events as a series of Transaction entries. So should you lose you account record and the latest balance you can always run though all the transactions to obtain the latest balance. That is event sourcing.
In ES one typically loads an entire list of events for an aggregate root (AR) to obtain its latest state. There is also, typically, a mechanism to deal with a huge number of events when loading all would cause performance issues: snapshots. Usually only the latest snapshot is stored. The snapshot contains the full latest state of the aggregate and only event after the snapshot version are applied.
One of the huge advantages of ES is that one could come up with new queries and then simply apply all the events to the query handler and determine the outcome. Perhaps something like: "How many customer do I have that have moved twice in the last year". Quite arbitrary but using the "traditional" approach the answer would quite likely be that we'll start gathering that information from today and have it available next year as we have not been saving the CustomerMoved events. With ES we can search for the CustomerMoved events and get a result at any point.
So this brings me back to your example. You probably do not want to be loading all the transactions. Instead store the "Turnover" and calculate it on the go. Should the "Turnover" be a new requirement then a once off processing of all the ARs should get it up to speed. You can still have a calculateTurnover() method somewhere but that would be something you wouldn't run all too often. And in those cases you would need to load all the transactions for an AR.

Complex Finds in Domain Driven Design

I'm looking into converting part of an large existing VB6 system, into .net. I'm trying to use domain driven design, but I'm having a hard time getting my head around some things.
One thing that I'm completely stumped on is how I should handle complex find statements. For example, we currently have a screen that displays a list of saved documents, that the user can select and print off, email, edit or delete. I have a SavedDocument object that does the trick for all the actions, but it only has the properties relevant to it, and I need to display the client name that the document is for and their email address if they have one. I also need to show the policy reference that this document may have come from. The Client and Policy are linked to the SavedDocument but are their own aggregate roots, so are not loaded at the same time the SavedDocuments are.
The user is also allowed to specify several filters to reduce the list down. These to can be from properties that are stored on the SavedDocument or the Client and Policy.
I'm not sure how to handle this from a Domain driven design point of view.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocuments, that I then have to turn into a different object or DTO, and fill with the additional client and policy information? That seem a little slow as I have to load all the details using multiple calls.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocumentsForList objects that contain just the information I want? This seems the quickest but doesn't feel like I'm using DDD.
Do I load everything from their objects and do all the filtering and column selection in a service? This seems the slowest, but also appears to be very domain orientated.
I'm just really confused how to handle these situations, and I've not really seeing any other people asking questions about it, which masks me feel that I'm missing something.

Queries can be handled in a few ways in DDD. Sometimes you can use the domain entities themselves to serve queries. This approach can become cumbersome in scenarios such as yours when queries require projections of multiple aggregates. In this case, it is easier to use objects explicitly designed for the respective queries - effectively DTOs. These DTOs will be read-only and won't have any behavior. This can be referred to as the read-model pattern.

Alternative Data Access pattern to Repository

I have certain objects in my domain which are not aggregate roots/entities, yet I still need to retrieve them from a database. I don't want to confuse things by creating repositories for these things. So, what are alternative data access patterns? Would you simply create a DAO for them, while still of course separating the interface?
Edit:
Some more detail on what I'm doing. I need to create a code. This code has certain rules as to its format. One of the rules is that the final character must be a unique number incremented by one from the last code generated. For example:
ABCD1
ABCD2
ABCD3
So, I'm keeping a table with one row, one column to store the number in question. Now, I don't want to consider this number an entity and create a repository for it - that's overkill. I just need a way of retrieving the number, adding 1 to it, and saving it. I know there are myriad ways I could do it, but I'm wondering if there's an customary way.

There are several data access patterns that could apply, in theory. You'd need to provide more detail though if you want us to suggest a specific pattern.
Without more detail, all I can suggest is to consider looking into Martin Fowler's Patterns of Enterprise Application Architecture book.
Edit: Customary way? No, not that I can think of - it really depends on where and how you're using this unique code in your domain. If I were doing this, I'd probably create a small service that speaks directly to the database to perform this function - not as heavy-weight as a repository, and very focused on the problem at hand.

Based on the edit: I would look first at the context in which you need to create that code. Perhaps there are some related entities or something that you are missing.
btw, I find the question really interesting as it comes up from time to time while coding specific features. I usually end up finding I was missing something on the scenario and it ends up fitting well with the normal repository pattern.

After surveying the options I'm going with the Table Gateway pattern.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string