Can you use CouchDB 'document update handlers' with replication? - couchdb

I am replicating docs from DB A to DB B, every time a Doc from DB A arrives in DB B I want to run a 'stored procedure' to remove most of the fields from DB A (DB A is private, but has attachments that I want to be publicly available)
So far I've seen that this might be achieved using the _changes feed (continuous)and then running an 'update' handler on each document.
The document update handlers doc: https://wiki.apache.org/couchdb/Document_Update_Handlers
This seems like something that CouchDB would implement for me... (and I'm not really sure yet how to do the above).
Is there something like a 'hook' that can be run on every document that enters the database?
== EDIT ==
It seems that I would want to somehow include the update handler command in the replication trigger?

It sounds like with some changes to how your storing documents you may be able to benefit from CouchDB's filtered replication. You'd need to store the attachments in documents that could be equivalently copied (without modification) between the two databases.
If that's not an option, then you could potentially use transform-pouchdb plus PouchDB's .replicate.from() method to manage the replication.
Some quick pseudo-code for this idea looks a bit like this:
var PouchDB = require('pouchdb');
PouchDB.plugin(require('transform-pouch'));
var dbA = new PouchDB('a'); // "a" could be a URL to CouchDB or Cloudant
var dbB = new PouchDB('b');
dbB.transform({
incoming: function (doc) {
// do something to the document before storage
return doc;
}
});
dbB.replicate.from(dbA);
In theory, that (or something like it) should do what you're wanting...or at least giving you the framework in which to do what you're wanting. ^_^
Hope that helps!

Related

In mongodb is it possible to create a transaction of 2 queries? Like find and update separately?

I use mongodb (with mongoose) and I have a case where I find a document, run a bunch of complicated conditional checks and then update the document.
This is fine but now I want to ensure that the document I'm updating in the end hasn't been updated by a different update query while my conditions have been running.
Can I create a lock or somehow contain all these actions within a transaction?
You add a simple field editor and make sure each process has ownership of the document when it's time to update.
Here is a simple example:
let processId = uniqueID;
//if doc is none then a different process 'owns' it. need to decide on behaviour.
let doc = await collection.findOneAndUpdate({_id: docId, editor: {$exists: false}}, {$set: {editor: processId}})
### do calculations. ###
let newValue = calculatedValue;
let newDoc = await collection.findOneAndUpdate({_id: docId, editor: processId}, {$unset: {editor: ""}})
I feel that using transactions as suggjested in the comments is an overkill, from the docs:
In MongoDB, an operation on a single document is atomic
And
For situations that require atomicity of reads and writes to multiple documents , MongoDB supports multi-document transactions.
Transactions are meant for more complex situations than this, and I feel this overhead is not needed as a simple solution suffices.

Insert or update multiple documents in MongoDB

I want to implement hashtags functionality with NodeJS and MongoDB support, so that I can also count the uses. Whenever a user adds hashtags to a page, I want to push or update them in the database. Each hastag looks like this:
{_id:<auto>, name:'hashtag_name', uses: 0}
The problem I'm facing is that the user can add new tags as well, so when he clicks 'done', I have to increment the 'uses' field for the existing tags, and add the new ones. The trick is how to do this with only one Mongo instruction? So far I thought of 2 possible ways of achieving this, but I'm not particularly happy with either:
Option 1
I have a service which fetches the existing tags from the db before the user starts to write a new article. Based on this, I can detect which tags are new, and run 2 queries: one which will add the new tags, and another which will update the existing one
Option 2
I will send the list of tags to the server, and there I will run a find() for every tag; if I found one, I'll update, if not, I'll create it.
Option 3 (without solution for now)
Best option would be to run a query which takes an array of tag names, do a $inc operation for the existing ones, and add the missing ones.
The question
Is there a better solution? Can I achieve the end result from option #3?
You should do something like this, all of them will be executed in one batch, this is only an snippet idea how to do it:
var db = new Db('DBName', new Server('localhost', 27017));
// Establish connection to db
db.open(function(err, db) {
// Get the collection
var col = db.collection('myCollection');
var batch = col.initializeUnorderedBulkOp();
for (var tag in hashTagList){
// Add all tags to be executed (inserted or updated)
batch.find({_id:tag.id}).upsert().updateOne({$inc: {uses:1}});
}
batch.execute(function(err, result) {
db.close();
});
});
I would use the Bulk method offered by Mongodb since version 2.6. In the same you could perform insertion operations when the tag is new and the counter update when it already exists.

Preventing concurrent access to documents in Mongoose

My server application (using node.js, mongodb, mongoose) has a collection of documents for which it is important that two client applications cannot modify them at the same time without seeing each other's modification.
To prevent this I added a simple document versioning system: a pre-hook on the schema which checks if the version of the document is valid (i.e., not higher than the one the client last read). At first sight it works fine:
// Validate version number
UserSchema.pre("save", function(next) {
var user = this
user.constructor.findById(user._id, function(err, userCurrent) { // userCurrent is the user that is currently in the db
if (err) return next(err)
if (userCurrent == null) return next()
if(userCurrent.docVersion > user.docVersion) {
return next(new Error("document was modified by someone else"))
} else {
user.docVersion = user.docVersion + 1
return next()
}
})
})
The problem is the following:
When one User document is saved at the same time by two client applications, is it possible that these interleave between the pre-hook and the actual save operations? What I mean is the following, imagine time going from left to right and v being the version number (which is persisted by save):
App1: findById(pre)[v:1] save[v->2]
App2: findById(pre)[v:1] save[v->2]
Resulting in App1 saving something that has been modified meanwhile (by App2), and it has no way to notice that it was modified. App2's update is completely lost.
My question might boil down to: Do the Mongoose pre-hook and the save method happen in one atomic step?
If not, could you give me a suggestion on how to fix this problem so that no update ever gets lost?
Thank you!
MongoDB has findAndModify which, for a single matching document, is an atomic operation.
Mongoose has various methods that use this method, and I think that they will suit your use case:
Model.findOneAndUpdate()
Model.findByIdAndUpdate()
Model.findOneAndRemove()
Model.findByIdAndRemove()
Another solution (one that Mongoose itself uses as well for its own document versioning) is to use the Update Document if Current pattern.

Delete multiple couchbase entities having common key pattern

I have a use case where I have to remove a subset of entities stored in couchbase, e.g. removing all entities with keys starting with "pii_".
I am using NodeJS SDK but there is only one remove method which takes one key at a time: http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/Bucket.html#remove
In some cases thousands of entities need to be deleted and it takes very long time if I delete them one by one especially because I don't keep list of keys in my application.
I agree with the #ThinkFloyd when he saying: Delete on server should be delete on server, rather than requiring three steps like get data from server, iterate over it on client side and finally for each record fire delete on the server again.
In this regards, I think old fashioned RDBMS were better all you need to do is 'DELETE * from database where something=something'.
Fortunately, there is something similar to SQL is available in CouchBase called N1QL (pronounced nickle). I am not aware about JavaScript (and other language syntax) but this is how I did it in python.
Query to be used: DELETE from <bucketname> b where META(b).id LIKE "%"
layer_name_prefix = cb_layer_key + "|" + "%"
query = ""
try:
query = N1QLQuery('DELETE from `test-feature` b where META(b).id LIKE $1', layer_name_prefix)
cb.n1ql_query(query).execute()
except CouchbaseError, e:
logger.exception(e)
To achieve the same thing: alternate query could be as below if you are storing 'type' and/or other meta data like 'parent_id'.
DELETE from <bucket_name> where type='Feature' and parent_id=8;
But I prefer to use first version of the query as it operates on key, and I believe Couchbase must have some internal indexes to operate/query faster on key (and other metadata).
The best way to accomplish this is to create a Couchbase view by key and then range query over that view via your NodeJS code, making deletes on the results.
http://docs.couchbase.com/admin/admin/Views/views-querySample.html
http://docs.couchbase.com/couchbase-manual-2.0/#couchbase-views-writing-querying-selection-partial
http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.8/ViewQuery.html
For example, your Couchbase view could look like the following:
function(doc, meta) {
emit(meta.id, null);
}
Then in your NodeJS code, you could have something that looks like this:
var couchbase = require('couchbase');
var ViewQuery = couchbase.ViewQuery;
var query = ViewQuery.from('designdoc', 'by_id');
query.range("pii_", "pii_" + "\u0000", false);
var myBucket = myCluster.openBucket();
myBucket.query(query, function(err, results) {
for(i in results) {
// Delete code in here
}
});
Of course your Couchbase design document and view will be named differently than the example that I gave, but the important part is the ViewQuery.range function that was used.
All document ids prefixed with pii_ would be returned, in which case you can loop over them and start deleting.
Best,

How do I design a couchdb view for following case ?

I am migrating an application from mySQL to couchDB. (Okay, Please dont pass judgements on this).
There is a function with signature
getUserBy($column, $value)
Now you can see that in case of SQL it is a trivial job to construct a query and fire it.
However as far as couchDB is concerned I am supposed to write views with map functions
Currently I have many views such as
get_user_by_name
get_user_by_email
and so on. Can anyone suggest a better and yet scalable way of doing this ?
Sure! One of my favorite views, for its power, is by_field. It's a pretty simple map function.
function(doc) {
// by_field: map function
// A single view for every field in every document!
var field, key;
for (field in doc) {
key = [field, doc[field]];
emit(key, 1);
}
}
Suppose your documents have a .name field for their name, and .email for their email address.
To get users by name (ex. "Alice" and "Bob"):
GET /db/_design/example/_view/by_field?include_docs=true&key=["name","Alice"]
GET /db/_design/example/_view/by_field?include_docs=true&key=["name","Bob"]
To get users by email, from the same view:
GET /db/_design/example/_view/by_field?include_docs=true&key=["email","alice#gmail.com"]
GET /db/_design/example/_view/by_field?include_docs=true&key=["name","bob#gmail.com"]
The reason I like to emit 1 is so you can write reduce functions later to use sum() to easily add up the documents that match your query.

Resources