CQRS design: nosql data view - domain-driven-design

CQRS design: nosql data view - domain-driven-design

This is a "language agnostic" question.
I started to study the CQRS pattern.
I've a simple question. I'm supposing to have 2 different storage layer: one relational for the commands(Mysql etc..) and one NoSql (mongo,cassandra.. etc) for the "query"?
Let me explain a little example:
1) As a user I want to insert a "Todo task"
Command: "Create Task" and will insert a new task into a database which have the User and the Todo tables.
2) As a user I'm able to see a list of created task
Query: "GetTasks" that will return a "view" with a collection of task taken from a non sql table named "UserTasks" which have a user and a list of created task.
Is the right approach? I'm sorry if the language is poor, it's just a little example.
If it seems a good approach (again, don't consider details) what is the best approach to keep updated the data stores?
I'm thinking to raise an event like "TaskCreated" and take the new task and insert those information in the nosql storage.
Thanks!

I can't really understand what you're looking for. but... typically, a command would be something that results in side effects. Queries don't cause side effects. GetTasks wouldn't really be a command, but a query.
Your "CreateTask" would be a command, which would result in the task added to the relevant data store(s). Your GetTasks query would retrieve that information from a datastore. It doesn't really matter if you're using a SQL or NoSQL store for this.
The "CommandStore" is typically the store that has just enough data to enforce invariants. In your case, what data is required for that? Is some information required to decide whether or not a task can be registered? For example, say, you have a requirement that a user can have at most 3 "todo"s. In this case, a table in the "Command Store" storing (UserId, Todo Count) is enough. You could also use (UserId, [TodoId]) - ie. store a list of todo ids so that you can gain idempotence. All other information about the user and tasks would be query data, and would be in the query store.
Hope that makes sense.

While there are times when you may wish to store commands, you generally don't. Rather a popular approach is to store the domain events that occur as a result of the commands.This is referred to as Event Sourcing. This would make 'STOREA' a store of events or to put it another way, an event stream. 'STOREB' is typically referred to as the Read Model. It has a de-normalised structure optimised for read speed. It is kept up to date via de-normalisers which respond to specific events. A key point to note here is that there is often a lag between the event being raised and the read model being updated. This in my opinion is a good thing but needs to be thought about when designing the UI.
For more info take a look at CQRS – A Step-by-Step Guide to the Flow of a typical Application
I hope that helps

Related

Command accros multiple aggregates with CQRS and ES

I'm having an odd case while thinking about a solution for my problem.
A quick recap: I'm using an event store with CQRS, and i have 2 aggregates called 'Group' and 'User'.
Basically a User defines some characteristics like his region, age, and a couple of interests.
He then can choose to 'match' with a Group that is in the same region, around the same age and same interests.
Now here's the case: the 'matchmaking' part should happen completely on the backend, it can be a long running process, but for the client it's just 1 call to the endpoint and the end result should be him matching with a group.
So for this case, I have to query the groups which have the same region, the same age slice, the interests don't really matter in my query. I know have a list of groups, and the match maker is going to give each group a rating based on the common interests between the group and the user. The group with the best rating will be joined.
So again, using CQRS and ES, and my problem is that this case seems a mix between queries and a command, and mixing queries into a match command seems to go against the purpose of CQRS.
Querying multiple groups and filtering them against my write side, the event store, also is a bad idea as the aggregates have to be rebuilt and loaded in memory before being able to filter them out.
So I:m kind of stuck here, something is telling me that a long running process / saga could be an answer to my problem, but I don't see how I would still not break the mix of query and commands in my saga, as a saga is basically a chain of commands/events.
How do I tackle this specific case ? No real code is needed, a conceptual solution to get me going is perfect.

Hi this is actually a case where CQRS can shine.
Creating a dedicated matching model seems to be ideal for this case to allow answering what might be a rather non-trivial query in other forms.
So,
create a dedicated (possibly ephemeral, possibly checkpointed/persisted) query model as derived store.
Upon request run a query to get the top matches.
based on the results of the query send a command to update the event store with the new links.
The query model will not need to manage commands and could be updated on a push basis from the event store. This will keep it rather simple to build and keep up to date and further can be optimized to only have the data needed for for this particular query.
An in-memory graph might do well.
-Chris
p.s.
On the command side: the commands here would each only update a single aggregate instance.
Further using the write ahead pattern would allow for not needing any sort of process manager or "saga."
e.g.
For each new membership 1 command to add the new membership to the user stream, then 1 command to the group to add the new member information. Then a simple audit process can scan for incomplete membership assignments both on start up/recovery and as a periodic data quality check.
-Chris

MongoDb slow aggregation with many collections (lookup)

i'm working on a MEAN stack project, i use too many collections in my aggregation so i use a lot of lookup, and that impacts negatively the performance and makes the execution of aggregation very slow. i was wondering if you have any suggestions , i found that we can reduce lookup by creating for each collection i need an array of objects into a globale collection however, i'm looking for an optimale and secured solution.
As an information, i defined indexes on all collections into mongo.
Thanks for sharing your ideas!

This is a very involved question. Even if you gave all your schemas and queries, it would take too long to answer, and be very specific to your case (ie. not useful to anyone else coming along later).
Instead for a general answer, I'd advise you to read into denormalization and consider some database redesign if this query is core to your project.
Here is a good article to get you started.
Denormalization allows you to avoid some application-level joins, at the expense of having more complex and expensive updates. Denormalizing one or more fields makes sense if those fields are read much more often than they are updated.
A simple example to outline it:
Say you have a Blog with a comment collection, and a user collection
You want to display the comment with the name of the user. So you have to load the player for every comment.
Instead you could save the username on the comment collection as well as the user collection.
Then you will have a fast query to show comments, as you don't need to load the users too. But if the user changes their name, then you will have to update all of the comments with the new name. This is the main tradeoff.
If a DB redesign is too difficult, I suggest splitting into multiple aggregates and combining them in memory (ie. in your node server side code)

How to structure relationships in Azure Cosmos DB?

I have two sets of data in the same collection in cosmos, one are 'posts' and the other are 'users', they are linked by the posts users create.
Currently my structure is as follows;
// user document
{
id: 123,
postIds: ['id1','id2']
}
// post document
{
id: 'id1',
ownerId: 123
}
{
id: 'id2',
ownerId: 123
}
My main issue with this setup is the fungible nature of it, code has to enforce the link and if there's a bug data will very easily be lost with no clear way to recover it.
I'm also concerned about performance, if a user has 10,000 posts that's 10,000 lookups I'll have to do to resolve all the posts..
Is this the correct method for modelling entity relationships?

As said by David, it's a long discussion but it is a very common one so, since I have on hour or so of "free" time, I'm more than glad to try to answer it, once for all, hopefully.
WHY NORMALIZE?
First thing I notice in your post: you are looking for some level of referential integrity (https://en.wikipedia.org/wiki/Referential_integrity) which is something that is needed when you decompose a bigger object into its constituent pieces. Also called normalization.
While this is normally done in a relational database, it is now also becoming popular in non-relational database since it helps a lot to avoid data duplication which usually creates more problem than what it solves.
https://docs.mongodb.com/manual/core/data-model-design/#normalized-data-models
But do you really need it? Since you have chosen to use JSON document database, you should leverage the fact that it's able to store the entire document and then just store the document ALONG WITH all the owner data: name, surname, or all the other data you have about the user who created the document. Yes, I’m saying that you may want to evaluate not to have post and user, but just posts, with user info inside it.This may be actually very correct, as you will be sure to get the EXACT data for the user existing at the moment of post creation. Say for example I create a post and I have biography "X". I then update my biography to "Y" and create a new post. The two post will have different author biographies and this is just right, as they have exactly captured reality.
Of course you may want to also display a biography in an author page. In this case you'll have a problem. Which one you'll use? Probably the last one.
If all authors, in order to exist in your system, MUST have blog post published, that may well be enough. But maybe you want to have an author write its biography and being listed in your system, even before he writes a blog post.
In such case you need to NORMALIZE the model and create a new document type, just for authors. If this is your case, then, you also need to figure out how to handler the situation described before. When the author will update its own biography, will you just update the author document, or create a new one? If you create a new one, so that you can keep track of all changes, will you also update all the previous post so that they will reference the new document, or not?
As you can see the answer is complex, and REALLY depends on what kind of information you want to capture from the real world.
So, first of all, figure out if you really need to keep posts and users separated.
CONSISTENCY
Let’s assume that you really want to have posts and users kept in separate documents, and thus you normalize your model. In this case, keep in mind that Cosmos DB (but NoSQL in general) databases DO NOT OFFER any kind of native support to enforce referential integrity, so you are pretty much on your own. Indexes can help, of course, so you may want to index the ownerId property, so that before deleting an author, for example, you can efficiently check if there are any blog post done by him/her that will remain orphans otherwise.
Another option is to manually create and keep updated ANOTHER document that, for each author, keeps track of the blog posts he/she has written. With this approach you can just look at this document to understand which blog posts belong to an author. You can try to keep this document automatically updated using triggers, or do it in your application. Just keep in mind, that when you normalize, in a NoSQL database, keep data consistent is YOUR responsibility. This is exactly the opposite of a relational database, where your responsibility is to keep data consistent when you de-normalize it.
PERFORMANCES
Performance COULD be an issue, but you don't usually model in order to support performances in first place. You model in order to make sure your model can represent and store the information you need from the real world and then you optimize it in order to have decent performance with the database you have chose to use. As different database will have different constraints, the model will then be adapted to deal with that constraints. This is nothing more and nothing less that the good old “logical” vs “physical” modeling discussion.
In Cosmos DB case, you should not have queries that go cross-partition as they are more expensive.
Unfortunately partitioning is something you chose once and for all, so you really need to have clear in your mind what are the most common use case you want to support at best. If the majority of your queries are done on per author basis, I would partition per author.
Now, while this may seems a clever choice, it will be only if you have A LOT of authors. If you have only one, for example, all data and queries will go into just one partition, limiting A LOT your performance. Remember, in fact, that Cosmos DB RU are split among all the available partitions: with 10.000 RU, for example, you usually get 5 partitions, which means that all your values will be spread across 5 partitions. Each partition will have a top limit of 2000 RU. If all your queries use just one partition, your real maximum performance is that 2000 and not 10000 RUs.
I really hope this help you to start to figure out the answer. And I really hope this help to foster and grow a discussion (how to model for a document database) that I think it is really due and mature now.

CQRS/Event Sourcing - Does one expect to receive an Aggregate Id from the user/request?

I am currently just trying to learn some new programming patterns and I decided to give event sourcing a shot.
I have decided to model a warehouse as my aggregate root in the domain of shipping/inventory where the number of warehouses is generally pretty constant (i.e. a company wont be adding warehouses too often).
I have run into the question of how to set my aggregateId, which should correspond to a warehouse, on my server. Most examples I have seen, including this one, show the aggregate ID being generated server side when a new aggregate is being created (in my case a warehouse), and then passed in the command request when referring to that aggregate for subsequent commands.
Would you say this is the correct approach? Can I expect the user to know and pass aggregate Ids when issuing commands? I realize this is probably domain dependent and could also be a UI/UX choice as well, just wondering what other's have done. It would make more sense to me if the number of my event sourced aggregates were more frequent, such as with meal tabs or shopping carts.
Thanks!

Heuristic: aggregate id, in many cases, is analogous to the primary key used to distinguish entities in a database table. Many of the lessons of natural vs surrogate keys apply.
Can I expect the user to know and pass aggregate Ids when issuing commands?
You probably can't depend on the human to know the aggregate ids. But the client that the human operator is using can very well know them.
For instance, if an operator is going to be working in a single warehouse during a session, then we might look up the appropriate identifier, cache it, and use it when constructing messages on behalf of the user.
Analog: when you fill in a web form and submit it, the browser does the work of looking at the form action and using that information to construct the correct URI, and similarly the correct HTTP Request.
The client will normally know what the ID is, because it just got it during a previous query.
Creation patterns are weird. It can, in some circumstances, make sense for the client to choose the identifier to be used when creating a new aggregate. In others, it makes sense for the client to provide an identifier for the command message, and the server decides for itself what the aggregate identifier should be.
It's messaging, so you want to be careful about coupling the client directly to your internal implementation details -- especially if that client is under a different development schedule. If you get the message contract right, then the server and client can evolve in any way consistent with the contract at any time.
You may want to review Greg Young's 10 year retrospective, which includes a discussion of warehouse systems. TL;DR - in many cases the messages coming from the human operators are events, not commands.

Would you say this is the correct approach?
You're asking if one of Greg Young's Event Sourcing samples represents the correct approach... Given that the combination of CQRS and Event Sourcing was essentially (re)invented by Greg, I'd say there's a pretty good chance of that.
In general, letting the code that implements the Command-side generate a GUID for every Command, Event, or other persistent object that it needs to write is by far the simplest implementation, since GUIDs are guaranteed to be unique. In a distributed system, uniqueness without coordination is a big thing.
Can I expect the user to know and pass aggregate Ids when issuing commands?
No, and you particularly can't expect a user to know the GUID of their assets. What you may be able to do is to present the user with a list of his or her assets. Each item in the list will have the GUID associated, but it may not be necessary to surface that ID in the user interface. It's just data that the underlying UI object carries around internally.
In some cases, users do need to know the ID of some of their assets (e.g. if it involves phone support). In that case, you can add a lookup API to address that concern.

DDD/CQRS Querying Events

I was looking at post's on querying in application designed with approach Event Sourcing/DDD/CQRS.
As I understand events are changes to the state of a domain object. The changes to state will be maintained as history/events in DB(any of sql/no sql).
If user wants to query to get current state for a particular aggregate root, it will involve fetching history of events.
When user will query especially business specific queries he/she will be interested in current state not the history of events.
How querying or 'Q' part in CQRS works with event sourcing?
Consider I have a domain object "Account" as aggregate root. The account AR will go through lots of changes i.e. credits debits. event store will have credit and debit events.
Consider user is required to get current balance of an account, how stream of history of events will suite here? How will user fetch current balance for given account?
I am unable to understand, How for business specific querying history of events will be useful?
-Prakhyat M M

I would recommend you to read more articles from Greg Young (He is like the father of CQRS and Event Sourcing), like this: CQRS, Task Based UIs, Event Sourcing... agh.
Sorry for my bad English, I am from Paraguay. But I really like DDD - CQRS - ES and I would like to try to make a point.
The use of "Projections" (also known as Materialized Views) and the concept of "Eventual Consistency" are the fundamentals that every practitioner of CQRS should understand very well. The Event Store is for query. Is in the Command side of CQRS, not the in the Query side. You may use a bus to send the events stored in the Event Store to the query side in order to process and generate a read model, or view models, from which you can query. In any case a eventstore per se is a query model.
Looks like you are a Java guy, but, still, you may want to check the CQRS Journey from Microsoft.
Hope this helps a little bit and motivates you to do more research on DDD / CQRS / ES, the New Trio of Line of Business Applications.

You'll use a projection of the event stream into the read model, that contains exactly those information that the Query-side (Q) needs. For example, you could have an "account balance" projection that follows all events that change the account balance, but possibly ignores other events in the account's stream (such as owner changes). The projection then saves that info in a way that it can be queried very quickly, e.g., in memory or in a small read-model database table (accountId, balance) with the accountId as the key (database can be a key-value store, for example).
I suggest further reading on the CQRS concept such as this one or this one.

Interesting enough, recently more people discover using event store as the read model, leaving projections and "proper" read models until absolutely necessary.
We all know that dealing with projections increases the complexity. At minimum you have to create new models, establish the DAL for the read model and create projections to translate event to the read model changes, and bind those projections to the stream of events from your store. It requires more code, more moving parts and some of them are not easy to test. Schema changes at the read side also require migrations.
It appears that for many scenarios reading all events (properly partitioned) might be enough to have your "read model". It takes not much time until the system really grows large so you need to read tens of thousands of events to create one UI screen. But before you reach this point, you can just read events. May be use the file system to store events although tools like EventStore are free and quite easy to use. May be add some indexing.
This approach let you stabilise the domain significantly, you get more knowledge about how the system works, tune the events and be really prepared to bring the "proper" read model into the system, but you might not have to.
Adam Dymitruk has wrote a blog post about it, you might find it worth reading even if you don't want to take this approach. Greg Young also gave a talk EventStore as read model back in 2012.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string