Why CouchDB generates a conflict when syncing a PouchDB document with a _rev difference higher than the limit of revisions? - couchdb

This is a strange behavior. First, we sync CouchDB and PouchDB databases. Second, the PouchDB database goes offline. After many modifications to a document, it goes online and sync with CouchDB. If PouchDB document _rev number is higher than CouchDB _rev number plus the revs limit, CouchDB generates a 409 "Document update conflict". Why? And what can we do to avoid it?

Unfortunately, this is the expected behaviour for the revs_limit option in PouchDB. The documentation says
revs_limit: Specify how many old revisions we keep track (not a copy) of. Specifying a low value means Pouch may not be able to figure out whether a new revision received via replication is related to any it currently has which could result in a conflict. Defaults to 1000.
If your PouchDB revs_limit is too low, it cannot be determined whether your local revision actually has the server revision in its history and therefore throws a conflict.
The straighforward way would be to increase the local revision limit. But if you're generating over 1000 revisions between syncs, you should consider changing your data model, splitting up large JSONs and storing the modifications in a new document and merging the data in the app instead of modifying the full document.
If that's not an option, simply check for conflicts and resolve them by deleting the server version whenever they occur.

Related

how to solve Couch db conflict while replicating database to server to local

we have couch server to local and local to server replication. after creating or updating a record for someday the collection will be in updated state after one or two day some collections will be updated previous state this causing conflict in the records. this conflict is happening to for some collections only remaining documents will be unchanged.
what is the issue for record overwritting?
is there is anyway or any api to automatically select conflicted record as winner?
CouchDB picks a winner for you - you can't choose which revision is the winner but you can delete any other revisions that you no longer need, leaving your chosen winner as the only surviving revision.
For more information on conflicts and how to deal with them, see this blog post.

CouchDB replication ignoring sporadic documents

I've got a CouchDB setup (CouchDB 2.1.1) for my app, which relies heavily on replication integrity. We are using the "one db per user" approach, with an additional layer of "role" db:s that groups users like the image below.
Recently, while increasing the number of beta testers, we discovered that some documents had not been replicated as they should. We are unable to see any pattern in document size, creation/update time, user or other. The errors seem to happen sporadically, with 2-3 successfully replicated docs followed by 4-6 non-replicated docs.
The server responds with {"error":"not_found","reason":"missing"} on those docs.
Most (but not all) of the user documents has been replicated to the corresponding Role DB, but very few made it all the way to the Master DB. This never happened when testing with < 100 documents (now we're at 1000-1200 docs in the db).
I discovered a problem with the "max open files" setting mentioned in the Performance chapter in the docs and fixed it, but the non-replicated documents are still not replicating. If I open a document and save it, it will replicate.
This is my current theory:
The replication process tried to copy new documents when the user went online
The write process failed due to Linux's "max_open_files" peaked
The master DB still thinks the replication was successful
At a later replication, the master DB ignores those old documents and only tries to replicate new ones
Could this be correct? And can I somehow make the CouchDB server "double check" all documents and the integrity of previous replications?
Thank you for your time and any helpful comments!
I have experienced something similar in the past - when attempting to replicate documents without sufficient permissions the replication fails as it should do. But when the permissions issue is fixed the documents you attempted to replicate cannot then be replicated, although edit/save on the documents fixes the issue. I wonder if this is due to checkpoints? The CouchDb manual says about the "use_checkpoints" flag:
Disabling checkpoints is not recommended as CouchDB will scan the
Source database’s changes feed from the beginning.
Though scanning from the beginning sounds like it might fix the problem, so perhaps disabling checkpoints could help. I never got back to that issue at the time so I am afraid this is not a proper answer, just a suggestion.

How to keep CouchDB efficiently with a lot of DELETE, purge?

I have a couchdb database with ~2000 documents (50MB), but 150K deleted documents in 3 months, and will be increase.
So, What is the better strategy to keep the performance high?
Use purge + compact, periodically re-create entire database?
The couchdb documentation recommends re-create database when store short-term data, isn't my case but the delete is constant in some kind of documents.
DELETE operation
If your use case creates lots of deleted documents (for example, if you are storing short-term data like log entries, message queues, etc), you might want to periodically switch to a new database and delete the old one (once the entries in it have all expired).
Using Apache CouchDB v. 2.1.1
The purge operation is not implemented at cluster level in CouchDB 2.x series (from 2.0.0 to 2.2.0) so it doesn't seem to be an option in your case.
This seems that will be supported in the next release 2.3.0. You can check the related issue here.
The same issue includes a possible workaround based on a database switch approach described here.
In your case, with Apache CouchDB 2.1.1 database switch is the only viable option.

Is it possible to get the latest seq number of PouchDB?

I am trying to cover for an issue where CouchDB is rolled back, causing PouchDB to be in the future. I want to find a way to detect this situation and force PouchDB to destroy and reload when this happens.
Is there a way to ask PouchDB for it's current pull seq number? I am not able to find any documentation at all on this. My google-foo is not strong enough.
So far my only thought is to watch the sync.on(change) feed, and record the seq number on every pull. Then on app reload, run this as ajax https:/server/db/_changes?descending=true&limit=1 and verify that the seq number this returns, is higher than the seq number I stored. If the stored seq is higher, then pouchdb.destroy(), purge _pouch_ from indexdb, and probably figure out how to delete websql versions for this release https://github.com/pouchdb/pouchdb/releases/tag/6.4.2.
Or is there a better way to solve situations where PouchDB ends up in the future ahead of CouchDB?
The problem seems to be in the replication checkpoint documents. When you recover a database from the backup, probably, you are recovering also the checkpoint local documents.
You should remove all local docs, by finding them with the _local_docs endpoint and then removing form the recovered database.
Doing this, your PouchDB should try to send to CouchDB their docs syncing back PouchDB and CouchDB.

design pattern to expire documents on cloudant

So when a document is deleted, the metadata is actually preserved forever. For a hosted service like cloudant, where storage costs every month, I instead would like to completely purge the deleted documents.
I read somewhere about a design pattern where you use dbcopy in a view to put the docs into a 'current' db then periodically delete the expired dbs. But I cant find the article, and I don't quite understand how database naming would work. How would the cloudant clients always know the 'current' database name?
Cloudant does not expose the _purge endpoint (the loose consistency guarantees between the clustered nodes make purging tricky).
The most common solution to this problem is to create a second database and use replication with a validate_document_update so that deleted documents with no existing entry in the target database are rejected. When replication is complete (or acceptably up-to-date if using continuous replication), switch your application to use the new database and delete the old one. There is currently no way to rename databases but you could use a virtual host which points to the "current" database.
I'd caution that a workload which generates a high ratio of deleted:active documents is generally an anti-pattern in Cloudant. I would first consider whether you can change your document model to avoid it.
Deleted documents are kept for ever in couchdb. Even after compaction .Though the size of document is pretty small as it contains only three fields
{_id:234wer,_rev:123,deleted:true}
The reason for this is to make sure that all the replicated databases are consistent. If a document that is replicated on several databases is deleted from one location there is no way to tell it to other replicated stores.
There is _purge but as explained in the wiki it is only to be used in special cases.

Resources