In CouchDB, should design documents be made for each "data entity", or model, that is being stored in the database? Or, should there be one design document that encompasses views for all of your needed views?
For example, if I have some Users and Comments in the database, would I have a single design document possibly called _design/blog and in it I'd have views named like usersByName, or commentsByDate, etc.
Or, should I have seperate design documents, like _design/users, _design/comments, in which I have views named like byName, or byDate?
That is a good question.A design document by itself is just a collection of views ,list , show,update functions.All of the work of your "application layer" will be performed by these functions.So I feel that rather than having a different design document for each of of your entities the separation should be made on these functions.The blogging scenario that you have outlined in the question is a great example of how design documents should be used.
Think of design documents as the file structure of your application.If you were building applications using some other framework you would probably have only one file structure for your application.By the way there is nothing stopping you from creating more than one design document in the application.But I feel that it gets a little unmanageable this way.
Just to recap
One design document per application with separate view,show functions if needed.
Related
I have recently started using Cosmos DB for a project and I am running into a few design issues. Coming from a SQL background, I understand that related data should be nested within documents on a NoSQL DB. This does mean that documents can become quite large though.
Since partial updates are not supported, what is the best design pattern to implement when you want to update a single property on a document?
Should I be reading the entire document server side, updating the value and writing the document back immeadiately in order to perform an update? This seems problematic if the documents are large which they inevitably would be if all your data is nested.
If I take the approach of making many smaller documents and infer relationships based on IDs I think this would solve the read/write immeadiately for updates concern but it feels like I am going against the concept of a NoSQL and in essence I am building a relational DB.
Thanks
Locking and latching. That's what needs to happen if partial updates become possible. It's a difficult engineering problem to keep a <15ms write latency SLA with locking.
This seems problematic if the documents are large which they inevitably would be if all your data is nested.
Define your fear — burnt Request Units, app host memory, ingress/egress network traffic? You believe this is a problem but you're not stating concrete results. I'm not saying you're wrong or doubting the efficiency of the partial update approach, i'm just saying the argument is thin.
Usually you want to JOIN nothing in NoSQL, so i'm totally with you on the last paragraph.
Whenever you are trying to create a document try to consider this:
Does the part of document need separate access . If yes then create a referenced document and if no then create a embedded document.
And if you want to know what to choose, i think you should need to take a look at this question its for MongoDb but will help you Embedded vs Referenced Document
Embed or Reference is the most common problem I face while designing document structure in NoSQL world.
In embedded relationship, child entities has been embedded into the parent document. In Reference relationship, child entities in separate documents and their parent in another document, basically having two (or more) types of documents.
There is no one relationship pattern fits all. The approach you should take depends on the Retrieve and Update to be done on the data is being designed.
1.Do you need to retrieve all the child entities along with the parent entities? If Yes, use embedded relationships.
2.Do your use case allow entities being retrieved individually? This case use relationship pattern.
Majority of the use cases I have worked, I used relationship pattern. For example: Social Graph (Profiles with Relationship Tree), Proximity Points (GeoJSON based proximity search), Classified Listing etc.
Relationship Pattern is also easier to update and maintain, as the entities are stored in individual documents.
Partial Updates are now supported by Cosmos DB:
Azure Cosmos DB Partial Document Update feature (also known as Patch
API) provides a convenient way to modify a document in a container.
Currently, to update a document the client needs to read it, execute
Optimistic Concurrency Control checks (if necessary), update the
document locally and then send it over the wire as a whole document
Replace API call.
Partial document update feature improves this experience
significantly. The client can only send the modified properties/fields
in a document without doing a full document replace operation
Read more here: https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update
I have a very large couchDB database that I host on Cloudant. One of the early noob mistakes I made was keep all my views under one design document. When I made a change to the design document by adding a new view, it would compile the design document again and make the database unavailable for a while.
After I talked to Cloudant, they told me it's good practice to have multiple design documents, and after doing some reading, it looks like CouchDB runs one view server per design document.
Now as in true startup fashion, we are constantly adding new features and hence new updates to the database (which is in production). Whenever I want to add a new view, I make a new design document and add the view to it.
With that background two questions.
Is this the right approach?
What naming scheme should my design documents follow?
You can have a master design document that provides a rewrite to another design document that contains the actual view you want to execute. The master design document shouldn't have any views so you can feel free to update that as often as you need. With this approach, the naming convention is up to you as long as you reference it correctly in the main design document's rewrite rules.
It's certainly not a bad approach. Given that views within a design doc are processed together, more design documents gives you greater parallelism when building views (assuming the cluster can handle it). You could also look at using Cloudant Query which provides an abstraction layer over map/reduce so you don't need to care about your design doc names.
In general, I would advise giving your design documents meaningful names - if you do need to add new views to an existing design doc, you can use this trick.
What is the difference between views as a stand-alone design documents and views grouped in one design document? When do you put two views in one design document? Is there any guide for this?
There's no real guide for this, as it's entirely up to you. Here are the implications as far as I can tell:
Each design document can have as many (or as few) views as you wish. Keep in mind that a view is not created or updated until it is first queried. Also, when a single view is queried, all the other views in that same design document will also be created/updated. This won't be a problem unless you have millions of documents, but it is something to bear in mind.
Also, I believe the full string value of the view is compared between revisions, so it won't rebuild a view if the name and function text are identical. (NOTE this is speculation based on what I've read about views, it's just never explicitly stated)
Generally, I've migrated towards having a "common" design document that contains a lot of the core CommonJS modules (like form validation functions) and other general settings. In addition, each "entity" in my project will have a separate design document with their own views, update handlers, validation functions, show/list functions, etc. This pattern keeps each entity and it's functions grouped together, almost like a class of sorts. I've found it is much easier to maintain and naming is a little easier when each entity is self-contained.
When creating views in CouchDB, how do you guys determine which design document to use for newly created views? That is, by what principles to determine if 2 or more views are put into the same design document?
Internally, the following things happen.
When CouchDB needs to update a view with new data, it will update all views in a design document at the same time, as an optimization.
If you change anything inside the design document views space (even changing whitespace or comments in your Javascript), CouchDB will discard the old index and rebuild the view from scratch.
Every update in a database must pass all validate_doc_update() functions from all design documents in the database.
For these reasons, it's best to consider one design document as one application.
One exception I personally use is a _design/couchdb document which has common views such as showing me all document conflicts.
I don't have much experience with couch but in general, it's a good idea to map an application to a design document. So, if you have a database foo accessed by an application bar, you'd have a bar design document inside foo which will contain all the views with that bar will need each named according to what they serve.
The guide contains some information how to put design documents in the right places.
In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?
I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.
Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.