CouchDB partial/differential writes - node.js

Basic problem
I have some large, but logically organised documents - and would like to perform updates on just a sub-section of an individual document.
Example
Given this simple document:
_id: 123456,
_rev: 3242342,
name: 'Stephen',
type: 'Person',
hobbies: [ 'sky-diving' ]
In my application I might have an addHobbies method, that would use a view that just retrieves:
_id: 123456,
_rev: 3242342,
hobbies: [ 'sky-diving' ]
So that it can then add an additional hobby to the hobbies array, and then PUT just this sub-set of data back to the document.
Question
As I understand it, CouchDB [1.2] does not allow partial updates like this, and so I believe it would be necessary to grab the whole document during the save operation, merge my changes, then PUT the whole document back on every single save.
Is there another way of doing this (am I wrong about CouchDB's capabilities)?
Are there any libraries (I'm using express on node.js) to handle this kind of operation?

You are correct. That is, in fact, what document database means: check-outs and check-ins.
You can create (or use) shim code to simulate what you want, letting you focus on the important parts. On the server side, you can use update functions.
There are many solutions on the client side.
cradle.js will give you fake partials updates with the merge method.
If you only want to update one or more attributes, and leave the others untouched, you can use the merge() method:
db.merge('luke', {jedi: true}, function (err, res) {
// Luke is now a jedi,
// but remains on the dark side of the force.
});
https://github.com/cloudhead/cradle/
Related, and also for Node.js is Transaction for performing arbitrary atomic transactions on CouchDB documents.

I would say that cradle is currently missing a real partial update feature, which would also support updating a path to a key inside the field value's JSON data, like Apache demonstrates here, rather than being limited to updating only a single key in a document, like the db.merge method.
In fact, looking at the cradle source, I see that there is a Database.prototype.update method (in lib/cradle/document.js), but this method doesn't seem to be documented.
It would be elegant of this could be made an integral part of cradle, eliminating the need to do separate requests to CouchDB view's updates only for partial updates.

Related

How to do in MongoDB what would have been perfect for a stored procedure in SQL - recursive query

I have the following scenario in a CMS I am building in NodeJs with MongoDb. I have three collections, customData, queries, and templates. Queries can depend on customData, and templates can depend on customData, queries, and other templates. What I need to do is to be able to very quickly figure out all of the documents that depend on a particular item when that item changes. e.g. If a particular customData item changes, I need a list of all the queries and templates that depended on that customData, as well as recursively all the templates that depend on those queries and templates. I need to then take that list and flag all of those documents for processing/regeneration. This is accomplished by setting a regenerate property equal to true on each of the documents in the list. This is the sort of thing that would be perfect for a stored procedure in any database with stored procedures, but I'm struggling to figure out the best solution using MongoDb. Every other need of my project is perfectly suited for Mongo. This is the only scenario that I'm having trouble with.
One solution I've come up with is to store the dependencies on each document as an array of named items as follows (e.g. a doc in the templates collection):
{
name: "SomeTemplate",
...
dependencies: [{type: "query", name: "top5Products"}, {type: "template", name: "header"}]
}
The object above denotes a template that depends on the query named "top5Products" as well as the template named "header". If either of those documents change, this template needs to be flagged for regeneration. I can accomplish the above with a getAllDependentsOfItem function that calls the following on both the queries and templates collections, unioning the results, then recursively calling getAllDependentsOfItem on each result.
this.collection.find({dependencies: item })
For instance, if the query changes, I can call it as follows, then call something else to flag all of those results...
let dependents = this.dependencyService.getAllDependentsOfItem({type: "query", name: "top5Products"});
This just feels very messy to me, especially wrestling with Promises and the recursion above. I haven't even finished the above, but the whole idea of Promises and recursion just seems like a whole lot of cruft for something that would have been so simple in a stored procedure. I just need the dependent documents flagged, and having to wade through all my layers of NodeJs code (CustomDataService, QueryService, TemplateService, DependencyService) to accomplish the above just feels wrong. Not to mention the fact that I keep coming up with a circular dependency between DependencyService and the others. DependencyService needs to call the QueryService and TemplateService to actually talk to those collections, and they need to notify the DependencyService when something changes. I know there are ways around that like using events or not having a DependencyService at all, or just talking directly to the Mongo driver from the DependencyService, but I haven't found a solution that feels right yet.
Another idea I had was to record the dependencies in a completely new collection called "dependencies". Perhaps using a document schema like the following...
{
name: "SomeTemplate",
type: "template",
dependencies: [{type: "query", name: "top5Products"}, {type: "template", name: "header"}]
}
This way the dependencies can be tracked completely separately from the documents themselves. I haven't gotten very far on that solution though.
Any ideas will be greatly appreciated.
Update:
Anyone?
I've since written all the javascript in mongo shell that, given the type and name of a changed item, will recursively find all the dependents of that item and update those dependents, setting the regenerate flags on those documents to "1".
My problem now is - how do I run this code on the MongoDb server by calling it from NodeJs? I need NodeJs to control when this happens and pass the changed item into it. I've been looking at the eval command, and that just looks like a bad idea. I think it's been deprecated in MongoDb versions > 3.
I can't imagine how this recursive code I wrote using cursors in mongo shell could be anything but MUCH slower when run from inside NodeJs on a different server than the database. All the queries recursively getting each document, incurring the latency back and forth across servers, then looping through the results to update the regenerate flag on all the dependent documents... I just can't wrap my brain around why this can't and shouldn't be done on the server somehow. It seems like the perfect scenario for some sort of batch, server-side mechanism, like, I dunno, a stored procedure!
Please help me figure out either how to do this, or how to do it the "Mongo way". I can post the mongo shell code that is working if it would help.

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

CouchDB - Parameter and views - What goes on behind the scenes, and is it fast/faster than temporary views?

Considering these three documents...
[
{
_id: "...",
_rev: "...",
title: "Foo",
body: "..."
},
{
_id: "...",
_rev: "...",
title: "Bar",
body: "..."
},
{
_id: "...",
_rev: "...",
title: "Hello World!",
body: "..."
},
]
And this view...
byTitle: {
map: function (document)
{
emit(document.title, document);
}
}
What goes on behind the scenes, when I query the view?...
GET /database/_design/posts/_view/byTitle?key="Foo"
I've asked a few questions on views lately... questions about what I phrased as "dynamic parameters"... Essentially I wanted to know how to do the equivalent of SELECT ... WHERE field = parameter
All answers steered me towards using temporary views, which are really slow, and should not be used in production. So my second question is... is the above method for querying by title, fit for use in production? Or am I forcing CouchDB to do unspeakable horrors, performance-wise?... am I essentially doing the same as using a temporary view?
I think you have misinterpreted some answer. You can use a temporary view to test various map/reduce functions. When you are satisfied with the code you should put it into a design document and use it for querying.
Temporary views are slow because the index is built and deleted for every query. Putting it into a design document, tells CouchDB to not delete the index and to keep it updated (this is done on query time).
So
GET /database/_design/posts/_view/byTitle?key="Foo"
is the fastest way to query by title because it is indexed.
As a side note: you can use
byTitle: {
map: function (document)
{
emit(document.title, null);
}
}
and query with include_docs=true to save some disk space.
For answering your question, a few things have to be cleared out (and I hope I get it all right):
Permanent vs. temporary views:
The difference between permanent and temporary views is, that permanent views are stored permanently.
In order to understand the storing part, you need to know, that CouchDB's storage engine relies on a B+ Tree offering very powerful indexing capabilities that enable us to find data in that storage by key in a "logarithmic amortized time" (CouchDB book).
CouchDB is handling documents in an "append only" manner. That means it is not like in the most relational DBMS where single values within a table row get updated and locking occurs. If a document is updated, it simply incrementally is set a new revision (_rev) and is appended to the storage.
When you are creating a permanent view, upon querying it the first time, for each document in your database, your new view is executed, storing that data to a new B+ tree file for that view, thus providing a new index to aggregate data according to the key you defined in your view.
Upon updating documents that are handled by that view, not the whole permanent view needs to be recomputed, but only the updated documents.
Now you should be able to understand why temporary views are nice for developing or testing in Futon, but since they have to be computed new for all your documents are not recommended for anything else than development.
Anyways. Marcello is right. If you are intending to just pass back complete documents, it is are encouraged to query with "include_docs=true". Why? Because the B-tree for your permanent view will just need to store the copied data next to your indexing key.
#Marcello-Nuccio I am not sure although if it is correct to say, that dynamic views have no index? As I understood, they have an index, but it makes no sense as they are computed new upon every query? Ok, now my brbain is hurting!

couchdb design views, updating fields on doc creation

Is it possible to have couch update or change fields on the fly when you create/update a doc? For example in the design view.... validate_doc_update:
function(newDoc, oldDoc, userCtx) {
}
Within that function I can throw errors like:
if(!newDoc.user_email && !newDoc.user_name && !newDoc.user_password){
throw({forbidden : 'all fields required'});
}
My Question is how would I reassign a field? I tried this:
newDoc.user_password ="changed";
with changed being some new value or hashed value. My overall goal is to build a user registration/login system with node and couchdb and have not found very good examples.
The validate_doc_update function cannot have any side effects and cannot change the document before storage. It only has the power to block an update or to let it through. This is important, because the function is not only called when a user requests an update, but also when changes are replicated from one CouchDB instance to another. So the function can be called multiple times for one document.
However, CouchDB now supports Document Update Handlers that can modify a document or even build it from scratch. These can be used to convert non-JSON input data into usable documents. You can find some documentation in the CouchDB Wiki.
Before you build your own user registration/login system, I'd suggest you look into the built-in CouchDB security features (if you haven't - some information here). They might not be enough for you (e.g. if you need email validation or something similar), but maybe you can build on them.

Resources