CouchDB replication ignoring sporadic documents - couchdb

I've got a CouchDB setup (CouchDB 2.1.1) for my app, which relies heavily on replication integrity. We are using the "one db per user" approach, with an additional layer of "role" db:s that groups users like the image below.
Recently, while increasing the number of beta testers, we discovered that some documents had not been replicated as they should. We are unable to see any pattern in document size, creation/update time, user or other. The errors seem to happen sporadically, with 2-3 successfully replicated docs followed by 4-6 non-replicated docs.
The server responds with {"error":"not_found","reason":"missing"} on those docs.
Most (but not all) of the user documents has been replicated to the corresponding Role DB, but very few made it all the way to the Master DB. This never happened when testing with < 100 documents (now we're at 1000-1200 docs in the db).
I discovered a problem with the "max open files" setting mentioned in the Performance chapter in the docs and fixed it, but the non-replicated documents are still not replicating. If I open a document and save it, it will replicate.
This is my current theory:
The replication process tried to copy new documents when the user went online
The write process failed due to Linux's "max_open_files" peaked
The master DB still thinks the replication was successful
At a later replication, the master DB ignores those old documents and only tries to replicate new ones
Could this be correct? And can I somehow make the CouchDB server "double check" all documents and the integrity of previous replications?
Thank you for your time and any helpful comments!

I have experienced something similar in the past - when attempting to replicate documents without sufficient permissions the replication fails as it should do. But when the permissions issue is fixed the documents you attempted to replicate cannot then be replicated, although edit/save on the documents fixes the issue. I wonder if this is due to checkpoints? The CouchDb manual says about the "use_checkpoints" flag:
Disabling checkpoints is not recommended as CouchDB will scan the
Source database’s changes feed from the beginning.
Though scanning from the beginning sounds like it might fix the problem, so perhaps disabling checkpoints could help. I never got back to that issue at the time so I am afraid this is not a proper answer, just a suggestion.

Related

Why CouchDB generates a conflict when syncing a PouchDB document with a _rev difference higher than the limit of revisions?

This is a strange behavior. First, we sync CouchDB and PouchDB databases. Second, the PouchDB database goes offline. After many modifications to a document, it goes online and sync with CouchDB. If PouchDB document _rev number is higher than CouchDB _rev number plus the revs limit, CouchDB generates a 409 "Document update conflict". Why? And what can we do to avoid it?
Unfortunately, this is the expected behaviour for the revs_limit option in PouchDB. The documentation says
revs_limit: Specify how many old revisions we keep track (not a copy) of. Specifying a low value means Pouch may not be able to figure out whether a new revision received via replication is related to any it currently has which could result in a conflict. Defaults to 1000.
If your PouchDB revs_limit is too low, it cannot be determined whether your local revision actually has the server revision in its history and therefore throws a conflict.
The straighforward way would be to increase the local revision limit. But if you're generating over 1000 revisions between syncs, you should consider changing your data model, splitting up large JSONs and storing the modifications in a new document and merging the data in the app instead of modifying the full document.
If that's not an option, simply check for conflicts and resolve them by deleting the server version whenever they occur.

Is it possible to get the latest seq number of PouchDB?

I am trying to cover for an issue where CouchDB is rolled back, causing PouchDB to be in the future. I want to find a way to detect this situation and force PouchDB to destroy and reload when this happens.
Is there a way to ask PouchDB for it's current pull seq number? I am not able to find any documentation at all on this. My google-foo is not strong enough.
So far my only thought is to watch the sync.on(change) feed, and record the seq number on every pull. Then on app reload, run this as ajax https:/server/db/_changes?descending=true&limit=1 and verify that the seq number this returns, is higher than the seq number I stored. If the stored seq is higher, then pouchdb.destroy(), purge _pouch_ from indexdb, and probably figure out how to delete websql versions for this release https://github.com/pouchdb/pouchdb/releases/tag/6.4.2.
Or is there a better way to solve situations where PouchDB ends up in the future ahead of CouchDB?
The problem seems to be in the replication checkpoint documents. When you recover a database from the backup, probably, you are recovering also the checkpoint local documents.
You should remove all local docs, by finding them with the _local_docs endpoint and then removing form the recovered database.
Doing this, your PouchDB should try to send to CouchDB their docs syncing back PouchDB and CouchDB.

Does CouchDB have a "bulk get all revisions" feature?

I'm using CouchDB with PouchDB and have noticed that remote-remote replication (or replication to PouchDB) does a lot of
/db/doc?revs=true&open_revs=all&attachments=true&_nonce=...
Do any of CouchDB's bulk APIs fetch the revs and open_revs (revs=true&open_revs=all) of more than one document at a time?
I saw your issue on GitHub as well. This is really something that would be better to ask in the CouchDB mailing list or #couchdb on IRC.
If you do all_docs with keys, you can actually get the most recent revision information even for deleted documents, but for more than one revision, I don't think so.
If what you're really asking is whether we've gotten replication in PouchDB to go about as fast as it can go given the current CouchDB replication protocol, I think the answer is yes. :)

design pattern to expire documents on cloudant

So when a document is deleted, the metadata is actually preserved forever. For a hosted service like cloudant, where storage costs every month, I instead would like to completely purge the deleted documents.
I read somewhere about a design pattern where you use dbcopy in a view to put the docs into a 'current' db then periodically delete the expired dbs. But I cant find the article, and I don't quite understand how database naming would work. How would the cloudant clients always know the 'current' database name?
Cloudant does not expose the _purge endpoint (the loose consistency guarantees between the clustered nodes make purging tricky).
The most common solution to this problem is to create a second database and use replication with a validate_document_update so that deleted documents with no existing entry in the target database are rejected. When replication is complete (or acceptably up-to-date if using continuous replication), switch your application to use the new database and delete the old one. There is currently no way to rename databases but you could use a virtual host which points to the "current" database.
I'd caution that a workload which generates a high ratio of deleted:active documents is generally an anti-pattern in Cloudant. I would first consider whether you can change your document model to avoid it.
Deleted documents are kept for ever in couchdb. Even after compaction .Though the size of document is pretty small as it contains only three fields
{_id:234wer,_rev:123,deleted:true}
The reason for this is to make sure that all the replicated databases are consistent. If a document that is replicated on several databases is deleted from one location there is no way to tell it to other replicated stores.
There is _purge but as explained in the wiki it is only to be used in special cases.

couchdb & futon - Is there a way to cancel a continuous replication using futon

Is there a way to cancel a continuous replication using futon ?
One of my developers started getting this funny error, when trying to replicate a template DB to his work environment.
Replicator failed:
{error,{'EXIT',{badarg,[{erlang,apply,[gen_server,start_link,undefined]},
{supervisor,do_start_child,2},
{supervisor,handle_call,3},
{get_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}}
After fiddling with it for a while, and testing I managed to reproduce the problem, I found that he probably checked by mistake the "continous" checkbox in the futon.
Now, we're working on windows. no magic curl thing. Well, obviously I can solve the problem for him from a linux, but I'm curious.
Is there a way to cancel a continuous replication using futon?
Delete the appropriate replication document.
In futon you'll see a _replicator database. Click on that and you'll see a list of docs. Each doc is a one-way replication from a source to a target. Find the offending one, and then just delete that document. Couch will immediately stop the replication task. It will not, however, remove any documents that were already replicated.

Resources