Principles of putting views into the same design document in CouchDB? - couchdb

When creating views in CouchDB, how do you guys determine which design document to use for newly created views? That is, by what principles to determine if 2 or more views are put into the same design document?

Internally, the following things happen.
When CouchDB needs to update a view with new data, it will update all views in a design document at the same time, as an optimization.
If you change anything inside the design document views space (even changing whitespace or comments in your Javascript), CouchDB will discard the old index and rebuild the view from scratch.
Every update in a database must pass all validate_doc_update() functions from all design documents in the database.
For these reasons, it's best to consider one design document as one application.
One exception I personally use is a _design/couchdb document which has common views such as showing me all document conflicts.

I don't have much experience with couch but in general, it's a good idea to map an application to a design document. So, if you have a database foo accessed by an application bar, you'd have a bar design document inside foo which will contain all the views with that bar will need each named according to what they serve.
The guide contains some information how to put design documents in the right places.

Related

Azure Cosmos DB Update Pattern

I have recently started using Cosmos DB for a project and I am running into a few design issues. Coming from a SQL background, I understand that related data should be nested within documents on a NoSQL DB. This does mean that documents can become quite large though.
Since partial updates are not supported, what is the best design pattern to implement when you want to update a single property on a document?
Should I be reading the entire document server side, updating the value and writing the document back immeadiately in order to perform an update? This seems problematic if the documents are large which they inevitably would be if all your data is nested.
If I take the approach of making many smaller documents and infer relationships based on IDs I think this would solve the read/write immeadiately for updates concern but it feels like I am going against the concept of a NoSQL and in essence I am building a relational DB.
Thanks
Locking and latching. That's what needs to happen if partial updates become possible. It's a difficult engineering problem to keep a <15ms write latency SLA with locking.
This seems problematic if the documents are large which they inevitably would be if all your data is nested.
Define your fear — burnt Request Units, app host memory, ingress/egress network traffic? You believe this is a problem but you're not stating concrete results. I'm not saying you're wrong or doubting the efficiency of the partial update approach, i'm just saying the argument is thin.
Usually you want to JOIN nothing in NoSQL, so i'm totally with you on the last paragraph.
Whenever you are trying to create a document try to consider this:
Does the part of document need separate access . If yes then create a referenced document and if no then create a embedded document.
And if you want to know what to choose, i think you should need to take a look at this question its for MongoDb but will help you Embedded vs Referenced Document
Embed or Reference is the most common problem I face while designing document structure in NoSQL world.
In embedded relationship, child entities has been embedded into the parent document. In Reference relationship, child entities in separate documents and their parent in another document, basically having two (or more) types of documents.
There is no one relationship pattern fits all. The approach you should take depends on the Retrieve and Update to be done on the data is being designed.
1.Do you need to retrieve all the child entities along with the parent entities? If Yes, use embedded relationships.
2.Do your use case allow entities being retrieved individually? This case use relationship pattern.
Majority of the use cases I have worked, I used relationship pattern. For example: Social Graph (Profiles with Relationship Tree), Proximity Points (GeoJSON based proximity search), Classified Listing etc.
Relationship Pattern is also easier to update and maintain, as the entities are stored in individual documents.
Partial Updates are now supported by Cosmos DB:
Azure Cosmos DB Partial Document Update feature (also known as Patch
API) provides a convenient way to modify a document in a container.
Currently, to update a document the client needs to read it, execute
Optimistic Concurrency Control checks (if necessary), update the
document locally and then send it over the wire as a whole document
Replace API call.
Partial document update feature improves this experience
significantly. The client can only send the modified properties/fields
in a document without doing a full document replace operation
Read more here: https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update

How to implement a proper layer separation in XPages (i.e. talk to Java objects, not DominoViews and DominoDocuments)

I'm trying to implement a proper layer separation in my XPage project. Ideally I'm trying to get to a point where the XML in the XPage contains no SSJS and uses only EL to access Java objects.
So far I've worked out how to load all my data from the domino database into Java Beans (where 1 document = 1 Object, more or less), I'm reading view contents into Java Maps or Lists, and I've managed to display the content of these collections in repeat controls.
What I'm unsure of is how to display the content of a 'form', of a single document, without referencing the domino document. In particular, I'm unsure of how to deal with the 'new document' case. I suppose I create an empty object, then set that object as a data source for the Xpage.
I'm aware that I have to use a ObjectDataSource for this, but I'm unsure where to actually store it. I read an article from Stephan Wissel stating that one shouldn't put them in managed bean, so where can I put it? In one of the scoped variables like viewScope?
Right now I've written an 'ApplicationBean' which is a session-scope Managed Bean where I'm storing all my objects.
What is the best practise? It seems that there are many different ways to meet that goal. Currently I'm exploring Christian Güdemann's XPages Toolkit, which sounds very promising. I know that Samir Pipalia, John Daalsgard and Frank van der Linden have worked up their own frameworks.
So how should I go about it? Any pitfalls?
This is a large topic indeed. As Paul mentioned, Tim's document model classes are a great example of how to do that clearly, and Tim goes into more detail in later episodes in that NotesIn9 series. My own Framework's model objects are fairly similar, though I also added collection managers to handle the dirty business of accessing views. For better or for worse, almost every XPage developer solves this problem in a unique way.
There are a number of ways you can go about implementing this sort of thing, and some of the differences aren't terribly important in normal cases (for example, whether you preload all data from the document when constructing the model object or do lazy fetching to the back-end only as needed), but there are definitely a couple overarching questions to tackle.
Model Access
As you mentioned in the question, one of the big problems is how you actually access model objects from the XPage - how the objects are fetched from the DB or created anew. My Framework's model objects use a conceit of "Manager" objects, which are application-scoped beans that allow getting either named collections (which map to views), model objects by UNID, or a new model object via the keyword "new". Generally, these models (which are Serializable) are then stored in the view scope of the page using them either via <xp:dataContext/>, <xe:objectData/>, or the Framework's own <ff:modelObjectData/>.
I've found it very wise to avoid using managed beans to represent individual objects (like "CurrentWhatever" that you then fill with data on each page), since that muddies up your faces-config in the best case or runs into session problems in the worst (if you put it in session scope, which I rarely use).
How you implement "new" vs. "fetched" model objects depends largely on the tack you take to write your models in the first place, but most boil down to having two constructors: one to take a UNID (at least) to point to existing document and one to create a new one. If you go the "write every properly explicitly in the object with getters and setters" route, the latter would also initialize all of the fields with default values instead of reading them from a document. Internally, you should have fields to store the UNID of the document, which can indicate whether it's new or not - then, your save method can check if this field is empty and create a new document if needed (and then store the new doc's UNID in the field).
Views
It sounds like you're already reading your model collections into Lists, which is good. One down side there, though, is scalability: with small (less than 100) collections, you're likely to not run into any load-speed problems, but afterwards things are going to slow down on initial page load as your code reads in the entire view ahead of time. You can mitigate this somewhat with efficient view reading, but there's a limit. The built-in views are generally speedy because they only load data as needed (they also cheat like hell to do so, but that's another issue).
This is a noble goal to aim for yourself, but doing so to cover all cases is no small feat: you end up running into questions of FT searching, column resorting, efficient data preloading (you don't want to re-open the View object only to read in one entry at a time, but you also don't want to read the whole thing), use in viewPanel and maybe others (which require specialized interfaces), expanding/collapsing categories, and so forth. It's a large sub-topic on its own.
Esoterica
You're also liable to run across other areas that are more difficult than you'd think at first, such as "proper" rich text handing and file attachments. Attachments, in particular, require direct conflict with the XSP framework to get to function properly with custom model objects and the standard upload/download controls.
Case-sensitivity in field names is another potential area of trouble. If you're writing getters and setters for all of your fields, it's a moot point, but if you're going the "thin wrapper" route (which I prefer), it's important to code any intermediary caches/lookups in a way that deal with the fact that "FOO" and "foo" are (basically) the same as item names to Notes, but are distinct in Java. The tack I take is to make extensive use of TreeMaps: if you pass String.CASE_INSENSITIVE_ORDER as the parameter to the constructor, it handles treating Strings as generally case-insensitive when used as keys.
Having your model objects work with all the standard controls like that may or may not be a priority - I find it very valuable, so I did a lot of legwork to make it happen with my framework, but if you're just going to do some basic Strings-and-numbers models, you don't necessarily need to worry.
So... yeah, it's a big topic! Depending on how confident you are with Java and the XPages undergirdings, I would suggest either going the route of fairly-simple "beans with getters and setters" for your objects or by looking into the implementation details of one of the existing frameworks (my own or the ones you mentioned). Sadly, there are a lot of little things that will crop up as your code gets more complicated, many of which are non-obvious to deal with.
Jesse Gallagher's Scaffolding framework also accesses Java objects rather than dominoDocument datasources. I've used (and modified) Tim Tripcony's Java objects in his "Using Java in XPages series" from NotesIn9 (the first is episode 132. The source code for what he does is on BitBucket. This basically converts the dominoDocument to a Map. It doesn't cover MIME (Rich Text) or attachments. If it's useful, I can share the amendments I made (e.g. making field names all the same case, to ensure it still works for existing documents, where fields may have been created with LotusScript).
Andrew - Jesse's one of the experts here so I'd read his response carefully.
What I do is I took one of the key pieces of Jesses bigger framework - the "pageControllers" and I use that HEAVILY. So I end up with a Java Class for each XPages to act as the controller. "All" Jesse's page controller framework does is make it a little easier to consume. So you can reference it on each page as "controller" and don't nee dot make individual managed beans for them.
I still will use SOMES SSJS on the XPage if I really need to for things like button events.. some methods that don't have proper getters and setters.. HashMap.size() for instance. But the vast bulk of the code goes into the Java Class. No real need for viewScope variables any more as well.
in the case of a "New Document".. In the controller I'll create a new Java Object that represents the "Current document". I'll bind all the fields to that. If it's new I create a new Object and assign it to the private variable. If I'm loading form somewhere then I take that variable and load the document I want.
I've started to really try and detail this in more recent NotesIn9's. Especially the little series on Java for Xpages developers. I think I got far enough there to show you what you need to know. I do plan on doing a lot more on this topic as soon as I can.

Naming convention for design documents in large CouchDB database

I have a very large couchDB database that I host on Cloudant. One of the early noob mistakes I made was keep all my views under one design document. When I made a change to the design document by adding a new view, it would compile the design document again and make the database unavailable for a while.
After I talked to Cloudant, they told me it's good practice to have multiple design documents, and after doing some reading, it looks like CouchDB runs one view server per design document.
Now as in true startup fashion, we are constantly adding new features and hence new updates to the database (which is in production). Whenever I want to add a new view, I make a new design document and add the view to it.
With that background two questions.
Is this the right approach?
What naming scheme should my design documents follow?
You can have a master design document that provides a rewrite to another design document that contains the actual view you want to execute. The master design document shouldn't have any views so you can feel free to update that as often as you need. With this approach, the naming convention is up to you as long as you reference it correctly in the main design document's rewrite rules.
It's certainly not a bad approach. Given that views within a design doc are processed together, more design documents gives you greater parallelism when building views (assuming the cluster can handle it). You could also look at using Cloudant Query which provides an abstraction layer over map/reduce so you don't need to care about your design doc names.
In general, I would advise giving your design documents meaningful names - if you do need to add new views to an existing design doc, you can use this trick.

Should design documents be organized based on data model?

In CouchDB, should design documents be made for each "data entity", or model, that is being stored in the database? Or, should there be one design document that encompasses views for all of your needed views?
For example, if I have some Users and Comments in the database, would I have a single design document possibly called _design/blog and in it I'd have views named like usersByName, or commentsByDate, etc.
Or, should I have seperate design documents, like _design/users, _design/comments, in which I have views named like byName, or byDate?
That is a good question.A design document by itself is just a collection of views ,list , show,update functions.All of the work of your "application layer" will be performed by these functions.So I feel that rather than having a different design document for each of of your entities the separation should be made on these functions.The blogging scenario that you have outlined in the question is a great example of how design documents should be used.
Think of design documents as the file structure of your application.If you were building applications using some other framework you would probably have only one file structure for your application.By the way there is nothing stopping you from creating more than one design document in the application.But I feel that it gets a little unmanageable this way.
Just to recap
One design document per application with separate view,show functions if needed.

Grouping views in design documents in CouchDB

What is the difference between views as a stand-alone design documents and views grouped in one design document? When do you put two views in one design document? Is there any guide for this?
There's no real guide for this, as it's entirely up to you. Here are the implications as far as I can tell:
Each design document can have as many (or as few) views as you wish. Keep in mind that a view is not created or updated until it is first queried. Also, when a single view is queried, all the other views in that same design document will also be created/updated. This won't be a problem unless you have millions of documents, but it is something to bear in mind.
Also, I believe the full string value of the view is compared between revisions, so it won't rebuild a view if the name and function text are identical. (NOTE this is speculation based on what I've read about views, it's just never explicitly stated)
Generally, I've migrated towards having a "common" design document that contains a lot of the core CommonJS modules (like form validation functions) and other general settings. In addition, each "entity" in my project will have a separate design document with their own views, update handlers, validation functions, show/list functions, etc. This pattern keeps each entity and it's functions grouped together, almost like a class of sorts. I've found it is much easier to maintain and naming is a little easier when each entity is self-contained.

Resources