Is it possible to get the latest seq number of PouchDB? - couchdb

I am trying to cover for an issue where CouchDB is rolled back, causing PouchDB to be in the future. I want to find a way to detect this situation and force PouchDB to destroy and reload when this happens.
Is there a way to ask PouchDB for it's current pull seq number? I am not able to find any documentation at all on this. My google-foo is not strong enough.
So far my only thought is to watch the sync.on(change) feed, and record the seq number on every pull. Then on app reload, run this as ajax https:/server/db/_changes?descending=true&limit=1 and verify that the seq number this returns, is higher than the seq number I stored. If the stored seq is higher, then pouchdb.destroy(), purge _pouch_ from indexdb, and probably figure out how to delete websql versions for this release https://github.com/pouchdb/pouchdb/releases/tag/6.4.2.
Or is there a better way to solve situations where PouchDB ends up in the future ahead of CouchDB?

The problem seems to be in the replication checkpoint documents. When you recover a database from the backup, probably, you are recovering also the checkpoint local documents.
You should remove all local docs, by finding them with the _local_docs endpoint and then removing form the recovered database.
Doing this, your PouchDB should try to send to CouchDB their docs syncing back PouchDB and CouchDB.

Related

Why CouchDB generates a conflict when syncing a PouchDB document with a _rev difference higher than the limit of revisions?

This is a strange behavior. First, we sync CouchDB and PouchDB databases. Second, the PouchDB database goes offline. After many modifications to a document, it goes online and sync with CouchDB. If PouchDB document _rev number is higher than CouchDB _rev number plus the revs limit, CouchDB generates a 409 "Document update conflict". Why? And what can we do to avoid it?
Unfortunately, this is the expected behaviour for the revs_limit option in PouchDB. The documentation says
revs_limit: Specify how many old revisions we keep track (not a copy) of. Specifying a low value means Pouch may not be able to figure out whether a new revision received via replication is related to any it currently has which could result in a conflict. Defaults to 1000.
If your PouchDB revs_limit is too low, it cannot be determined whether your local revision actually has the server revision in its history and therefore throws a conflict.
The straighforward way would be to increase the local revision limit. But if you're generating over 1000 revisions between syncs, you should consider changing your data model, splitting up large JSONs and storing the modifications in a new document and merging the data in the app instead of modifying the full document.
If that's not an option, simply check for conflicts and resolve them by deleting the server version whenever they occur.

CouchDB replication ignoring sporadic documents

I've got a CouchDB setup (CouchDB 2.1.1) for my app, which relies heavily on replication integrity. We are using the "one db per user" approach, with an additional layer of "role" db:s that groups users like the image below.
Recently, while increasing the number of beta testers, we discovered that some documents had not been replicated as they should. We are unable to see any pattern in document size, creation/update time, user or other. The errors seem to happen sporadically, with 2-3 successfully replicated docs followed by 4-6 non-replicated docs.
The server responds with {"error":"not_found","reason":"missing"} on those docs.
Most (but not all) of the user documents has been replicated to the corresponding Role DB, but very few made it all the way to the Master DB. This never happened when testing with < 100 documents (now we're at 1000-1200 docs in the db).
I discovered a problem with the "max open files" setting mentioned in the Performance chapter in the docs and fixed it, but the non-replicated documents are still not replicating. If I open a document and save it, it will replicate.
This is my current theory:
The replication process tried to copy new documents when the user went online
The write process failed due to Linux's "max_open_files" peaked
The master DB still thinks the replication was successful
At a later replication, the master DB ignores those old documents and only tries to replicate new ones
Could this be correct? And can I somehow make the CouchDB server "double check" all documents and the integrity of previous replications?
Thank you for your time and any helpful comments!
I have experienced something similar in the past - when attempting to replicate documents without sufficient permissions the replication fails as it should do. But when the permissions issue is fixed the documents you attempted to replicate cannot then be replicated, although edit/save on the documents fixes the issue. I wonder if this is due to checkpoints? The CouchDb manual says about the "use_checkpoints" flag:
Disabling checkpoints is not recommended as CouchDB will scan the
Source database’s changes feed from the beginning.
Though scanning from the beginning sounds like it might fix the problem, so perhaps disabling checkpoints could help. I never got back to that issue at the time so I am afraid this is not a proper answer, just a suggestion.

PouchDB/CouchDB Conflict Resolution Server Side

I'm new to pouch/couch and looking for some guidance on handling conflicts. Specifically, I have an extension running pouchdb (distributed to two users). Then the idea is to have a pouchdb-server or couchdb (does it matter for this small a use case?) instance running remotely. The crux of my concern is handling conflicts, the data will be changing frequently and though the extensions won't be doing live sync, they will be syncing very often. I have conflict handling written into the data submission functions, however there could still be conflicts when syncing occurs with multiple users.
I was looking at the pouch-resolve-conflicts plugin and see immediately the author state:
"Conflict resolution should better be done server side to avoid hard to debug loops when multiple clients resolves conflicts on the same documents".
This makes sense to me, but I am unsure how to implement such conflict
resolution. The only way I can think would be to place REST API layer
in front of the remote database that handles all updates/conflicts etc with custom logic.
But then how could I use the pouch sync functionality? At that point I
may as well just use a different database.
I've just been unable to find any resources discussing how to implement conflict resolution server-side, in fact the opposite.
With your use case, you could probably write to a local pouchdb instance and sync it with the master database. Then, you could have a daemon that automatically resolve conflicts on your master database.
Below is my approach to solve a similar problem.
I have made a NodeJS daemon that automatically resolve conflicts. It integrates deconflict, a NodeJS library that allows you to resolve a document in three ways:
Merge all revisions together
Keep the latest revisions (based on a custom key. Eg: updated_at)
Pick a certain revision (Here you can use your own logic)
Revision deconflict
The way I use CouchDB, every write is partial. We always take some changes and apply them to the latest document. With this approach, we can easily take the merge all revision strategy.
Conflict scanner
When the daemon boot, two processes are executed. One that go through all the changes. If a conflict is detected, it's added to a conflict queue.
Another process is executed and remain active: Continuous changes scanner.
It listen to all new changes and add conflicted documents to the conflict queue
Queue processing
Another process is started and keeps polling the queue for new conflicted documents. It gets conflicted documents in batch and resolve them on by one. If there's not documents, it just wait a certain period and starts the polling again.
Having worked a little bit with Redux I realized that the same concept of unidirectional flow would help me avoid the problem of conflicts altogether.
Redux flows like this...
So, my clientside code never write definitive data to the master database, instead they write insert/update/delete requests locally which PouchDB then pushes to the CouchDB master database. On the same server as the master CouchDB I have PouchDB in NodeJS replicating these requests. "Superviser" software in NodeJS examines each new request, changes their status to "processing" writes the requested updates, inserts and deletes, then marks the request "processed". To ensure they're processed one at time the code that receives each request, stuffs them into a FIFO. The processing code pulls them from the other end.
I'm not dealing with super high volume, so the latency is not a concern.
I'm also not facing a situation where numerous people might be trying to update exactly the same record at the same time. If that's your situation, your client-side update requests will need to specify the rev number and your "supervisors" will need to reject change requests that refer to a superseded version. You'll have to figure out how your client code would get and respond to those rejections.

How to keep CouchDB efficiently with a lot of DELETE, purge?

I have a couchdb database with ~2000 documents (50MB), but 150K deleted documents in 3 months, and will be increase.
So, What is the better strategy to keep the performance high?
Use purge + compact, periodically re-create entire database?
The couchdb documentation recommends re-create database when store short-term data, isn't my case but the delete is constant in some kind of documents.
DELETE operation
If your use case creates lots of deleted documents (for example, if you are storing short-term data like log entries, message queues, etc), you might want to periodically switch to a new database and delete the old one (once the entries in it have all expired).
Using Apache CouchDB v. 2.1.1
The purge operation is not implemented at cluster level in CouchDB 2.x series (from 2.0.0 to 2.2.0) so it doesn't seem to be an option in your case.
This seems that will be supported in the next release 2.3.0. You can check the related issue here.
The same issue includes a possible workaround based on a database switch approach described here.
In your case, with Apache CouchDB 2.1.1 database switch is the only viable option.

Does CouchDB have a "bulk get all revisions" feature?

I'm using CouchDB with PouchDB and have noticed that remote-remote replication (or replication to PouchDB) does a lot of
/db/doc?revs=true&open_revs=all&attachments=true&_nonce=...
Do any of CouchDB's bulk APIs fetch the revs and open_revs (revs=true&open_revs=all) of more than one document at a time?
I saw your issue on GitHub as well. This is really something that would be better to ask in the CouchDB mailing list or #couchdb on IRC.
If you do all_docs with keys, you can actually get the most recent revision information even for deleted documents, but for more than one revision, I don't think so.
If what you're really asking is whether we've gotten replication in PouchDB to go about as fast as it can go given the current CouchDB replication protocol, I think the answer is yes. :)

Resources