Where to put these kind of queries in DDD - domain-driven-design

I have an entity Institute and a repository InstituteRepository which fetches Institute objects based on criteria passed. Now somewhere in my application, I need ViewCount for the institute (No. of times the institute page has been viewed, which is stored and updated in a database table).
I cache my Institute objects, but since ViewCount is very dynamic, I would like to fetch it afresh everytime. Question is, where should I put my getViewCount() function?
Can I have a function like getViewCount() in InstituteRepository? If not, what's the best place for it?
Appreciate any help and sorry for the vague title.

This would definitely fit in a separate bounded context that tracks "viewing related behavior". No need for a repository. Assuming you're using a relational datastore, just do an "insert into InstituteViewRecord (instituteid, user-who-viewed-id, date-and-time-of-viewing) values (...)" to track this information and a "select count(*) from InstituteViewRecord where instituteid = ". KISS. Any remoting needs can be satisfied using RPC or other mechanisms of messaging. I doubt this functionality is core domain.

Related

MongoDB, how to manage user related records

I'm currently trying to learn Node.js and Mongoodb by building the server side of a web application which should manage insurance documents for the insurance agent.
So let's say i'm the user, I sign in, then I start to add my customers and their insurances.
So I have 2 collection related, Customers and Insurances.
I have one more collection to store the users login data, let's call it Users.
I don't want the new users to see and modify the customers and the insurances of other users.
How can I "divide" every user related record, so that each user can work only with his data?
I figured out I can actually add to every record, the _id of the one user who created the record.
For example I login as myself, I got my Id "001", I could add one field with this value in every customer and insurance.
In that way I could filter every query with this code.
Would it be a good idea? In my opinion this filtering is a waste of processing power for mongoDB.
If someone has any idea of a solution, or even a link to an article about it, it would be helpful.
Thank you.
This is more a general permissions problem than just a MongoDB question. Also, without knowing more about your schemas it's hard to give specific advice.
However, here are some approaches:
1) Embed sub-documents
Since MongoDB is a document store allowing you to store arbitrary JSON-like objects, you could simply store the customers and licenses wholly inside each user object. That way querying for a user would return their customers and licenses as well.
2) Denormalise
Common practice for NoSQL databases is to denormalise related data (ie. duplicate the data). This might include embedding a sub-document that is a partial representation of your customers/licenses/whatever inside your user document. This has the similar benefit to the above solution in that it eliminates additional queries for sub-documents. It also has the same drawbacks of requiring more care to be taken for preserving data integrity.
3) Reference with foreign key
This is a more traditionally relational approach, and is basically what you're suggesting in your question. Depending on whether you want the reference to be bi-directional (both documents reference each other) or uni-directional (one document references the other) you can either store the user's ID as a foreign user_id field, or store an array of customer_ids and insurance_ids in the user document. In relational parlance this is sometimes described to as "has many" or "belongs to" (the user has many customers, the customer belongs to a user).

How to structure relationships in Azure Cosmos DB?

I have two sets of data in the same collection in cosmos, one are 'posts' and the other are 'users', they are linked by the posts users create.
Currently my structure is as follows;
// user document
{
id: 123,
postIds: ['id1','id2']
}
// post document
{
id: 'id1',
ownerId: 123
}
{
id: 'id2',
ownerId: 123
}
My main issue with this setup is the fungible nature of it, code has to enforce the link and if there's a bug data will very easily be lost with no clear way to recover it.
I'm also concerned about performance, if a user has 10,000 posts that's 10,000 lookups I'll have to do to resolve all the posts..
Is this the correct method for modelling entity relationships?
As said by David, it's a long discussion but it is a very common one so, since I have on hour or so of "free" time, I'm more than glad to try to answer it, once for all, hopefully.
WHY NORMALIZE?
First thing I notice in your post: you are looking for some level of referential integrity (https://en.wikipedia.org/wiki/Referential_integrity) which is something that is needed when you decompose a bigger object into its constituent pieces. Also called normalization.
While this is normally done in a relational database, it is now also becoming popular in non-relational database since it helps a lot to avoid data duplication which usually creates more problem than what it solves.
https://docs.mongodb.com/manual/core/data-model-design/#normalized-data-models
But do you really need it? Since you have chosen to use JSON document database, you should leverage the fact that it's able to store the entire document and then just store the document ALONG WITH all the owner data: name, surname, or all the other data you have about the user who created the document. Yes, I’m saying that you may want to evaluate not to have post and user, but just posts, with user info inside it.This may be actually very correct, as you will be sure to get the EXACT data for the user existing at the moment of post creation. Say for example I create a post and I have biography "X". I then update my biography to "Y" and create a new post. The two post will have different author biographies and this is just right, as they have exactly captured reality.
Of course you may want to also display a biography in an author page. In this case you'll have a problem. Which one you'll use? Probably the last one.
If all authors, in order to exist in your system, MUST have blog post published, that may well be enough. But maybe you want to have an author write its biography and being listed in your system, even before he writes a blog post.
In such case you need to NORMALIZE the model and create a new document type, just for authors. If this is your case, then, you also need to figure out how to handler the situation described before. When the author will update its own biography, will you just update the author document, or create a new one? If you create a new one, so that you can keep track of all changes, will you also update all the previous post so that they will reference the new document, or not?
As you can see the answer is complex, and REALLY depends on what kind of information you want to capture from the real world.
So, first of all, figure out if you really need to keep posts and users separated.
CONSISTENCY
Let’s assume that you really want to have posts and users kept in separate documents, and thus you normalize your model. In this case, keep in mind that Cosmos DB (but NoSQL in general) databases DO NOT OFFER any kind of native support to enforce referential integrity, so you are pretty much on your own. Indexes can help, of course, so you may want to index the ownerId property, so that before deleting an author, for example, you can efficiently check if there are any blog post done by him/her that will remain orphans otherwise.
Another option is to manually create and keep updated ANOTHER document that, for each author, keeps track of the blog posts he/she has written. With this approach you can just look at this document to understand which blog posts belong to an author. You can try to keep this document automatically updated using triggers, or do it in your application. Just keep in mind, that when you normalize, in a NoSQL database, keep data consistent is YOUR responsibility. This is exactly the opposite of a relational database, where your responsibility is to keep data consistent when you de-normalize it.
PERFORMANCES
Performance COULD be an issue, but you don't usually model in order to support performances in first place. You model in order to make sure your model can represent and store the information you need from the real world and then you optimize it in order to have decent performance with the database you have chose to use. As different database will have different constraints, the model will then be adapted to deal with that constraints. This is nothing more and nothing less that the good old “logical” vs “physical” modeling discussion.
In Cosmos DB case, you should not have queries that go cross-partition as they are more expensive.
Unfortunately partitioning is something you chose once and for all, so you really need to have clear in your mind what are the most common use case you want to support at best. If the majority of your queries are done on per author basis, I would partition per author.
Now, while this may seems a clever choice, it will be only if you have A LOT of authors. If you have only one, for example, all data and queries will go into just one partition, limiting A LOT your performance. Remember, in fact, that Cosmos DB RU are split among all the available partitions: with 10.000 RU, for example, you usually get 5 partitions, which means that all your values will be spread across 5 partitions. Each partition will have a top limit of 2000 RU. If all your queries use just one partition, your real maximum performance is that 2000 and not 10000 RUs.
I really hope this help you to start to figure out the answer. And I really hope this help to foster and grow a discussion (how to model for a document database) that I think it is really due and mature now.

How to denormalize deep hierarchies?

I’ve read quite a lot about Cassandra and the art of denormalization and materialization while writing the data. I think I understand the concept, and it seems to make sense. However, I am having some trouble implementing it in scenarios where there is a deep hierarchical data structure.
Consider the contrived domain where
Owner 1:* Company
Company 1:* Teams
Team 1:* Players
Players 1:* Equipment
We have tables for each of these entities, but we would also like to query quickly for equipment attributes by owner so it seems the thing to do is create a table (OwnerEquipment) that has the owner id and the equipment id as the primary key with the owner id as the partition key. This makes sense, but what if the UX scenarios that add and edit equipment do not include the owner’s id as part of the working set?
Most of the denormalization examples I’ve encountered in my research are usually a single level parent-child or master-detail type use case. It seems pretty reasonable that an updating client would have enough information about the immediate parent when updating the child to write the denormalized reverse index, but what if the data you would really like to denormalize by is several “joins” away?
This problem is compounded further in our example when we consider a Company is sold to a different Owner. Assume that the desired behavior is for OwnerEquipment to reflect this change. How should the code that writes this updated Company to the database handle the OwnerEquipment table updates? Should it, knowing the ID of the old owner, try to update all the OwnerEquipment records for that owner? This seems like a very un-Cassandra-y thing to do and also fraught with concurrency issues. The problem gets worse as you move down the chain (Team to new Company, Player to new Team). In these cases the “old owner” is not necessarily in the working set and would need to be read in order to be updated.
Are there some better ways to think about this problem?
This makes sense, but what if the UX scenarios that add and edit equipment do not include the owner’s id as part of the working set?
Easy, pass the owner id along with equipment id to the UX. Owner id can be a hidden value not to be shown on the interface
but what if the data you would really like to denormalize by is several “joins” away?
Create as many tables for different query use-cases
For multiple updates and denormalizations, you can look at the new materialized views feature. Read my blog: www.doanduyhai.com/blog/?p=1930

CQRS design: nosql data view

This is a "language agnostic" question.
I started to study the CQRS pattern.
I've a simple question. I'm supposing to have 2 different storage layer: one relational for the commands(Mysql etc..) and one NoSql (mongo,cassandra.. etc) for the "query"?
Let me explain a little example:
1) As a user I want to insert a "Todo task"
Command: "Create Task" and will insert a new task into a database which have the User and the Todo tables.
2) As a user I'm able to see a list of created task
Query: "GetTasks" that will return a "view" with a collection of task taken from a non sql table named "UserTasks" which have a user and a list of created task.
Is the right approach? I'm sorry if the language is poor, it's just a little example.
If it seems a good approach (again, don't consider details) what is the best approach to keep updated the data stores?
I'm thinking to raise an event like "TaskCreated" and take the new task and insert those information in the nosql storage.
Thanks!
I can't really understand what you're looking for. but... typically, a command would be something that results in side effects. Queries don't cause side effects. GetTasks wouldn't really be a command, but a query.
Your "CreateTask" would be a command, which would result in the task added to the relevant data store(s). Your GetTasks query would retrieve that information from a datastore. It doesn't really matter if you're using a SQL or NoSQL store for this.
The "CommandStore" is typically the store that has just enough data to enforce invariants. In your case, what data is required for that? Is some information required to decide whether or not a task can be registered? For example, say, you have a requirement that a user can have at most 3 "todo"s. In this case, a table in the "Command Store" storing (UserId, Todo Count) is enough. You could also use (UserId, [TodoId]) - ie. store a list of todo ids so that you can gain idempotence. All other information about the user and tasks would be query data, and would be in the query store.
Hope that makes sense.
While there are times when you may wish to store commands, you generally don't. Rather a popular approach is to store the domain events that occur as a result of the commands.This is referred to as Event Sourcing. This would make 'STOREA' a store of events or to put it another way, an event stream. 'STOREB' is typically referred to as the Read Model. It has a de-normalised structure optimised for read speed. It is kept up to date via de-normalisers which respond to specific events. A key point to note here is that there is often a lag between the event being raised and the read model being updated. This in my opinion is a good thing but needs to be thought about when designing the UI.
For more info take a look at CQRS – A Step-by-Step Guide to the Flow of a typical Application
I hope that helps

How are Value Objects stored in the database?

I haven't really seen any examples, but I assume that they are saved inside the containing entity table within the database.
Ie. If I have a Person entity/aggregate root and a corresponding Person table, if it had a Value Object called Address, Address values would be saved inside this Person table!
Does that make sense for a domain where I have other entities such as Companies etc. that have an Address?
(I'm currently writing a project management application and trying to get into DDD)
It's ok to store Value Objects in a separate table, for the very reasons you've described. However, I think you're misunderstanding Entities vs VOs - it's not a persistence related concern.
Here's an example:
Assume that a Company and Person both have the same mail Address. Which of these statements do consider valid?
"If I modify Company.Address, I want
Person.Address to automatically get
those changes"
"If I modify Company.Address, it
must not affect Person.Address"
If 1 is true, Address should be an Entity, and therefore has it's own table
If 2 is true, Address should be a Value Object. It could be stored as a component within the parent Entity's table, or it could have its own table (better database normalisation).
As you can see, how Address is persisted has nothing to do with Entity/VO semantics.
Most developers tend to think in the database first before anything else. DDD does not know about how persistence is handled. That's up to the repository to deal with that. You can persist it as an xml, sql, text file, etc etc. Entities/aggregates/value objects are concepts related to the domain.
Explanation by Vijay Patel is perfect.

Resources