CQRS/DDD: Checking referential integrity - domain-driven-design

Should a command handler also check for referential integrity? This FAQ suggest that you shouldn't check this in the aggregates (http://cqrs.nu/Faq). Isn't checking that something exists part of the validation?
For example you can have an application where you can add comments to an article.
The command would have these fields:
Id
ArticleId
UserId
Text
For this example the comment and article are a different aggregateroot.
Should you check for this example if the article already exists and the user exists? It feels a bit strange that you can add a comment to an article that doesn't exists.

I presume that you have a reason to divide Article and Comment into separate aggregate roots. When I face with a question like yours, usually it is an indication of the opportunity to rethink the domain model.
There is nothing wrong with referential integrity check but think about the eventual nature of the changes to the aggregates that don't belong to the same root. What does the result of the validation indicate?
If the article does not exist, is it because it was not added and you have an integrity issue in the command? Or maybe it was added but has not yet been propagated to the query side of the application? Or maybe it has been already removed before the user has posted the comment?
What if the validation confirms that the article exists? Maybe moderator has removed it, but the result was not yet propagated?
Remember, you could only rely on the order of events when they happen under the same aggregate root.
To summarize: you could verify referential integrity in a command handler as long as you realize that there might be false positives and false negatives. If you expect incoming commands to have unreliable data often, maybe this verification would limit the rate of errors. However, try to rethink the structure of your aggregates if the consistency is critical.

Should you check for this example if the article already exists and the user exists?
Not usually, no.
It feels a bit strange that you can add a comment to an article that doesn't exists.
Separation of responsibilities: the domain model is responsible for understanding how the command applies to the current state of the model. The command handler just checks the integrity of the command.

Related

Should I use a nestjs pipe, guard or I should go for an interceptor?

Well I have a few pipes in the application I'm working on and I'm starting to think they actually should be guards or even interceptors.
One of them is called PincodeStatusValidationPipe and its job as simple as snow. It checks the cache for a certain value if that value is the one expected then it returns what it gets otherwise it throws the FORBIDEN exception.
Another pipe is called UserExistenceValidationPipe it operates on the login method and checks if a user exists in DB and some other things related to that user (e.g. wheter a password expected in the login method is present and if it does then whether it matches that of the retrieved user) otherwise it throws appropriate exceptions.
I know it's more of a design question but I find it quite important and I would appreciate any hints. Thanks in advance.
EDIT:
Well I think UserExistenceValidationPipe is definitely not the best name choice, something like UserValidationPipe fits way better.
If you are throwing a FORBIDEN already, I would suggest migrating the PincodeStatusValidationPipe to be PincodeStatusValidationGuard, as returning false from a guard will throw a FORBIDEN for you. You'll also have full access to the Request object which is pretty nice to have.
For the UserExistenceValidationPipe, a pipe is not the worst thing to have. I consider existence validation to be a part of business logic, and as such should be handled in the service, but that's me. I use pipes for data validation and transformation, meaning I check the shape of the data there and pass it on to the service if the shape looks correct.
As for interceptors, I like to use those for logging, caching, and response mapping, though I've heard of others using interceptors for overall validators instead of using multiple pipes.
As the question is mostly an opinionated one, I'll leave the final decision up to you. In short, guards are great for short circuiting requests with a failure, interceptors are good for logging, caching, and response mapping, and pipes are for data validation and transformation.

CQRS/Event Sourcing - Does one expect to receive an Aggregate Id from the user/request?

I am currently just trying to learn some new programming patterns and I decided to give event sourcing a shot.
I have decided to model a warehouse as my aggregate root in the domain of shipping/inventory where the number of warehouses is generally pretty constant (i.e. a company wont be adding warehouses too often).
I have run into the question of how to set my aggregateId, which should correspond to a warehouse, on my server. Most examples I have seen, including this one, show the aggregate ID being generated server side when a new aggregate is being created (in my case a warehouse), and then passed in the command request when referring to that aggregate for subsequent commands.
Would you say this is the correct approach? Can I expect the user to know and pass aggregate Ids when issuing commands? I realize this is probably domain dependent and could also be a UI/UX choice as well, just wondering what other's have done. It would make more sense to me if the number of my event sourced aggregates were more frequent, such as with meal tabs or shopping carts.
Thanks!
Heuristic: aggregate id, in many cases, is analogous to the primary key used to distinguish entities in a database table. Many of the lessons of natural vs surrogate keys apply.
Can I expect the user to know and pass aggregate Ids when issuing commands?
You probably can't depend on the human to know the aggregate ids. But the client that the human operator is using can very well know them.
For instance, if an operator is going to be working in a single warehouse during a session, then we might look up the appropriate identifier, cache it, and use it when constructing messages on behalf of the user.
Analog: when you fill in a web form and submit it, the browser does the work of looking at the form action and using that information to construct the correct URI, and similarly the correct HTTP Request.
The client will normally know what the ID is, because it just got it during a previous query.
Creation patterns are weird. It can, in some circumstances, make sense for the client to choose the identifier to be used when creating a new aggregate. In others, it makes sense for the client to provide an identifier for the command message, and the server decides for itself what the aggregate identifier should be.
It's messaging, so you want to be careful about coupling the client directly to your internal implementation details -- especially if that client is under a different development schedule. If you get the message contract right, then the server and client can evolve in any way consistent with the contract at any time.
You may want to review Greg Young's 10 year retrospective, which includes a discussion of warehouse systems. TL;DR - in many cases the messages coming from the human operators are events, not commands.
Would you say this is the correct approach?
You're asking if one of Greg Young's Event Sourcing samples represents the correct approach... Given that the combination of CQRS and Event Sourcing was essentially (re)invented by Greg, I'd say there's a pretty good chance of that.
In general, letting the code that implements the Command-side generate a GUID for every Command, Event, or other persistent object that it needs to write is by far the simplest implementation, since GUIDs are guaranteed to be unique. In a distributed system, uniqueness without coordination is a big thing.
Can I expect the user to know and pass aggregate Ids when issuing commands?
No, and you particularly can't expect a user to know the GUID of their assets. What you may be able to do is to present the user with a list of his or her assets. Each item in the list will have the GUID associated, but it may not be necessary to surface that ID in the user interface. It's just data that the underlying UI object carries around internally.
In some cases, users do need to know the ID of some of their assets (e.g. if it involves phone support). In that case, you can add a lookup API to address that concern.

Is it a good idea to rely on a given aggregate's history with Event Sourcing?

I'm currently dealing with a situation in which I need to make a decision based on whether it's the first time my aggregate got into a situation (an Order was bought).
I can solve this problem in two ways:
Introduce in my aggregate a field stating whether an order has ever been bought (or maybe the number of bought orders);
Look up in the aggregate's history for any OrderWasBought event.
Is option 2 ever acceptable? For some reason I think option 1) is for the general case safer / cleaner but I lack experience in these matters.
Thanks
IMHO both effectively do the same thing: The field stating that an order was bought needs to be hydrated somehow. Basically this would be done as part of the replay, which basically does not mean anything but that when an OrderWasBought event happened, the field will be set.
So, it does not make any difference if you look at the field, or if you look for the existence of the event. At least it does not make a difference, when it is about the effective result.
Talking about efficiency, it may be the better idea to use a field, since this way the field gets hydrated as part of the replay, which needs to be run anyway. So, you don't have to search the list of events again, but you can simply look at the (cached) value in the field.
So, in the end, to cut a long story short: It doesn't matter. Use what feels better to you. If the history of an aggregate gets lengthy, you may be better off using the field approach in terms of performance.
PS: Of course, this depends on the implementation of how aggregates are being loaded – is the aggregate able to access its own event history at all? If not, setting a field while the aggregate is being replayed is your only option, anyway. Please note that the aggregate does not (and should not!) have access to the underlying repository, so it can not load its history on its own.
Option 2 is valid as long as the use case doesn't need the previous state of the aggregate. Replaying events only restores a readonly state, if the current command doesn't care about it, searching for a certain event may be a valid simple solution.
If you feat "breaking encapsulation" this concern may not apply. Event sourcing and aggregate are concepts mainly they don't impose a certain OO approach. The Event Store contains the business state expressed as a stream of events. You can read it and use it as an immutable collection any time. I would replay events only if I'd need a certain complex state restored. But in your case here, the simpler 'has event' solution encapsulated as a service should work very well.
That being said, there's nothing wrong with always replaying events to restore state and have that field. It's a matter of style mostly, choose between a consistent style of doing things or adapt it to go for the simplest solution for a given case.

Scratch couchdb document

Is it possible to "scratch" a couchdb document? By that I mean to delete a document, and make sure that the document and its history is completely removed from the database.
I do not want to perform a database compaction, I just want to fully wipe out a single document. And I am looking for a solution that guarantees that there is no trace of the document in the database, without needing to wait for internal database processes to eventually remove the document.
(a python solution is appreciated)
When you delete a document in CouchDB, generally only the _id, _rev, and a deleted flag are preserved. These are preserved to allow for eventual consistency through replication. Forcing an immediate delete across an entire group of nodes isn't really consistent with the architecture.
The closest thing would be to use purge; once you do that, all traces of the doc will be gone after the next compaction. I realize this isn't exactly what you're asking for, but it's the closest thing off the top of my head.
Here's a nice article explaining the basis behind the various delete methods available.
Deleting anything from file system for sure is difficult, and usually quite expensive problem. Even more with databases in general. Depending of what for sure means to you, you may end up with custom db, custom os and custom hw. So it is kind of saying I want fault tolerant system, yes everyone would like to have one, but only few can afford it, but good news is that most can settle for less. Similar is for deleteing for sure, I assume you are trying to adress some security or privacy issue, so try to see if there is some other way to get what you need. Perhaps encrypting the document or sensitive parts of it.

Should we not rely on CouchDB to generate uuid?

I was reading 'CouchDB: The Definitive Guide' and I was confused by this paragraph:
For demoing purposes, having CouchDB assign a UUID is fine. When you write your first programs, we recommend assigning your own UUIDs. If your rely on the server to generate the UUID and you end up making two POST requests because the first POST request bombed out, you might generate two docs and never find out about the first one because only the second one will be reported back. Generating your own UUIDs makes sure that you’ll never end up with duplicate documents.
I thought that uuids (specifically the _id) were saved only when the document creation was successful. That is, when I "post" an insert request for a new document, the _id is generated automatically. If the document is saved then the field is kept, otherwise discarded. Is that not the case?
Can you please explain what is the correct way to generate _id fields in CouchDB?
I think this quote is not really about UUIDs but about using PUT (which is idempotent) instead of POST.
Check this thread for more information : Consequences of POST not being idempotent (RESTful API)
I think that quote is wrong or out of date, and it's fine to rely on CouchDB for ID generation. I've used this at work a lot and have never really run into any issues.

Resources