CouchDB - Views not being updated after delete - couchdb

I have a view similar to this (a contrived example):
function(doc) {
if (doc.attrib) {
emit([doc.attrib],doc._id)
}
}
Everything works as expected until the data is deleted. I get this crazy scenario where there is no data in the actual database (confirmed via _all_docs and _changes run on curl as well as all_documents on Futon). However the view still yields data (again on both curl and Futon).
The delete comprises Bulk delete and purge operations via ektorp. Running _changes after each confirms these work as expected. re-creating the view makes it reflect the true state of the documents in the DB.
Have I missed something obvious here or are views in CouchDB only incremental?

Did you really _purge the data? That should invalidate the view and cause a full rebuild. I'll note that _purge is not recommended for normal use. It exists only for accidents like putting your plaintext password in a document.
You may have exposed a bug in _purge, though, so if you can reliably induce this with _purge but not if you just delete, I encourage you to file a ticket on our JIRA (https://issues.apache.org/jira/browse/COUCHDB).
I'll note also that the fix will be to blow away the index if you purge, there is no incremental approach possible (you are literally removing the information that an incremental approach requires).

Related

CouchDB replication strategy with dynamic groups of users

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y
Possible groups: A+C, A+B
The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.
The goal:
To replicate changes and deletions accordingly.
What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present
Is there a recipe for this situation? Am I missing an obvious solution?
EDIT
Partial solution for case 1:
localDB.sync(remoteDB, {
live: true,
retry: true,
filter: 'app/by_user',
query_params: { "agente": agent }
})
.on('paused', function(info){
console.log("paused");
localDB.allDocs().then(function(docs){
console.log("allDocs");
docs.rows.forEach(function(row){
console.log(row);
remoteDB.get(row.id)
.then(function(doc){
if(doc.Agents.indexOf(agent) < 0){
localDB.remove(doc);
}
});
});
});
})
.on('change', function(result){
console.log("change!");
result.change.docs.forEach(function(change) {
if(!change.deleted){
$rootScope.$apply(function(){
$rootScope.$broadcast('upsert', change);
});
}
});
});
Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"
(3) Seems like the simplest solution to me, i.e. the "database per role" solution.
I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.
Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.
Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.
Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!
We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh
Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.
1) still sounds as the simplest approach to me..
I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.
I mean.. a delete is like an update which sets the _deleted attribute to true.
So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:
function(doc,req){
// optional acls for deleting doc.. doc is owned by req.userCtx.name
// doc.users are users already granted to work with this doc
return [{
"_id" : doc._id,
"_rev": doc._rev,
"_deleted":true,
"users": doc.users
},"Ok doc deleted"];
}
Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.
The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

CouchDB view results contain "missing" docs after purging

After purging a set of documents in a Couch database, some view results contain documents which are actually not there in the database. When accessing such documents following error message is returned
{"error":"not_found","reason":"missing"}
Also the view results contain duplicate entries for some of such "missing" documents.
Some of these docs contain conflicted revisions as well.
Following is a simple view which lists such documents. According to the view, there should not be duplicate results.
function(doc) {
if (doc.documentType == 'theDocType') {
emit(theDocType, doc);
}
}
I created a new document with an id of a "missing" document, and tried purging it again (giving the new rev and all the conflicting revs). But after purging, the view results remained same as earlier.
Any idea what has caused this and how to resolve this problem ?
I just recently had this issue too and found your question.
I fixed it by deleting the view records, stored here on Windows
"...\couchdb.2.1.1\data\.dbname_design\mrview\*.view"
here on Linux
<couch data directory>/.dbname_design\mrview*.view (usually /var/lib/couchdb or /usr/local/var/lib/couchdb)
Each .view is named with an md5, delete them all, then restart the service. Then request the view again and it will rebuild this index, it might take 2 or 3 attempts before it builds it properly depending on the size of the database.
Hopefully someone can add what the linux path is.

PouchDB - Manually managing conflicts

Is it possible to manage the sync conflicts from the client?
What I mean is, when pouchDB does a sync and detects a conflict, is it possible to get the local doc PouchDB is trying to sync and the last revision of CouchDB doc? If I can get both docs, I can display them to the user and he can choose which version to keep...
You're in luck, because this is exactly the problem CouchDB and PouchDB were designed to solve.
Basically you can read up on the CouchDB docs on conflict resolution. Everything in there should also apply to PouchDB. (If it doesn't, it's a bug. ;)). The CouchDB wiki also has a nice writeup.
Edit: so to provide more details, you'll want to fetch the document with ?conflicts=true ({conflicts:true} in PouchDB). E.g. you'll fetch a doc like this:
http://localhost:5984/db1/foo?conflicts=true
And get a doc back like this:
{
"_id":"foo",
"_rev":"2-f3d4c66dcd7596419c76b2498b3ba21f",
"notgonnawork":"this is from the second db",
"_conflicts":["2-c1592ce7b31cc26e91d2f2029c57e621"]
}
Here I have a conflict introduced from another database, and that database's revision has won (randomly). This document's current revision starts with 2-, and the conflicting version also starts with 2-, indicating that they're both at the same level of the revision tree.
To get the conflicting version, you just grab the conflicting rev and call:
http://localhost:5984/db1/foo?rev=2-c1592ce7b31cc26e91d2f2029c57e621
And you get:
{
"_id":"foo",
"_rev":"2-c1592ce7b31cc26e91d2f2029c57e621",
"notgonnawork":"this is from the first database"
}
So after presenting the two conflicting versions to the user, you can then add a 3rd revision on top of both of these, which either combines the results, or chooses the losing version, or whatever you want. This next revision will be prefixed with 3-. Make sense?
Edit: Apparently you also need to delete the conflicting version, or else it will still show up in _conflicts. See this answer.

How can I "undelete" a set of documents in CouchDB?

I have a large set of documents in a CouchDB database that were just accidentally bulk deleted using _deleted:true. I also have a backup for this set of data that includes their last known good revision and metadata. I need to maintain the same _id, so simple restore with a new _id is not an option.
Compaction has not been run and I can access any of these documents via the &rev= url parameter as well as their attachments (which are needed).
What I need to do is "restore" these documents to the revision I have on file. Surprisingly, I have come up empty with any queries on how to achieve this. Tips or hacks appreciated.
If you just PUT the whole document, including the attachment stub, back into the DB, with the deleted rev, but less the _deleted:true parameter, then all will be well.

couchdb design views, updating fields on doc creation

Is it possible to have couch update or change fields on the fly when you create/update a doc? For example in the design view.... validate_doc_update:
function(newDoc, oldDoc, userCtx) {
}
Within that function I can throw errors like:
if(!newDoc.user_email && !newDoc.user_name && !newDoc.user_password){
throw({forbidden : 'all fields required'});
}
My Question is how would I reassign a field? I tried this:
newDoc.user_password ="changed";
with changed being some new value or hashed value. My overall goal is to build a user registration/login system with node and couchdb and have not found very good examples.
The validate_doc_update function cannot have any side effects and cannot change the document before storage. It only has the power to block an update or to let it through. This is important, because the function is not only called when a user requests an update, but also when changes are replicated from one CouchDB instance to another. So the function can be called multiple times for one document.
However, CouchDB now supports Document Update Handlers that can modify a document or even build it from scratch. These can be used to convert non-JSON input data into usable documents. You can find some documentation in the CouchDB Wiki.
Before you build your own user registration/login system, I'd suggest you look into the built-in CouchDB security features (if you haven't - some information here). They might not be enough for you (e.g. if you need email validation or something similar), but maybe you can build on them.

Resources