How to re-replicate documents after they have been purged in remote database - couchdb

I have an application where the documents available in a remote database are a subset of documents available on the server. When the subset required by the user changes, documents that are no longer needed in the remote database are purged (yes, purged, not deleted) and new documents replicated. If the subset required by the user was changed to include documents that have been previously purged, I can't find a way to make the purged documents replicate again to reinstate them on the client.
A simple scenario to consider is:
Create two databases, A and B
Create a document "D" in A
Replicate database A to B
In B, purge D
Replicate A to B again and notice that D is not replicated
I've tried compacting B, to no avail. I can understand that with continuous replication, D will not be sent again because it has not changed. But I can't get D to be re-replicated using one-time replication either. How can I make a replication copy D from A to B once CouchDB is in this state?
I'm using CouchDB 2.3.

CouchDB stores a local replication log on each node, when replication is done.
It's probably fetching this log, and picking up where it left off, thus ignoring changes that happened before the last replication (such as creating documents which are now purged).
I can think of two solutions to this:
Manually delete these replication logs, by looking for _local/ documents, and deleting them.
Change, even slightly, the replication parameters, so that CouchDB generates a new replication ID for the sake of logging. One way to do this would be to add a filter function (it could be a filter function that filters nothing).

Related

Surprising behavior with replicated, deleted CouchDb documents

We have two CouchDb servers, let's call them A and B. There's one-way replication from A to B and documents are only created, modified, or deleted on A - basically you can think of B as just a backup. There was a document on A that was deleted. When I tried to retrieve the revision prior to deletion from A I got {"error":"not_found","reason":"missing"} but that DB hasn't been compacted (at least as I understand it compaction only happens if you start it manually and that wasn't done). However, while B knew the document had been deleted the old revision was available on B.
My understanding is that if we haven't manually run compaction the old revision should always be available on A. Furthermore, when B replicates if there were multiple revisions since the last replication it'll pull metadata for the old revisions but might not pull the documents. Thus, in this setup, the set of revisions available on B should always be a proper subset of those available on A. So how could B have a revision that A does not?
We're on CouchDb 2.3.0.

Deleting _deleted documents on CouchDB by date

My CouchDb database is getting bigger and I would like to remove documents by date also I would like to remove _deleted documents by date
I know how to replicate my DB removing documents by date but:
¿Is there a way to do the same with _deleted documents? I mean remove _deleted documents by date
There's not really a way to conditionally cause a deletion using filtered replication, nor can you replicate a complete removal of a document.
You have a variety of options:
you can avoid replicating updates on old documents by filtering on date, but if they have already been replicated they won't be deleted
you can make a view to return old documents, and use a script to delete them at the source database. The deletions will replicate to any target databases, but all database will retain at least a {_deleted:true} tombstone of the documents [that's how the deletion gets replicated in the first place]
you can find old documents and _purge them, but you'll have to do that on each replica
What is your main goal?
If you have hundreds of objects and you want to hide the old ones from the UI of all replicas, write a script to find and DELETE/_delete:true them from in a source/master replica and the changes will propagate.
If you have bazillions of e.g. log messages and you need to free up space by forgetting old ones, write a script to find and _purge and finally _compact, then run it on every replica. But for a case like that, it might be better to rotate databases instead, e.g. manually "shard" or bin into a different database each week, and every week simply drop the N+1 weeks old database on each replica.
If your database is getting bigger, this is probably due to the versionning of your documents. A simple way to free some space is to run database compaction (Documentation)
As for _deleted documents, you can only REALLY delete them by purging
Therefore, it's not recommended to purge _deleted documents. It should only be done to remove very important files such as credentials.

Cloudant / couch db Two way replication - what prevents recursiveness?

We have two cloudant databases say A and B on two separate clusters. We have setup two way replication between these databases, so A->B and B->A.
1) If a document X is updated on A , it gets replicated to B. But this change on B is not replicated again back to A. So it does not get into indefinite recursive cycle. Is this achieved using the revision numbers ? I believe it might be internal to couchdb.
2) We need to figure out by looking at a document in both A and B , that which database actually received the update through API call and which one received the update through replication. Is there a way to figure this out ?
The CouchDB replication protocol is well defined and makes sure that the replication is done in a reliable manner.
CouchDB has no concept of a master. Once synced, all CouchDB instances are identical so it won't be possible to determine which node received the original request. If you need to do this, you probably should reevaluate whether replication is what you really want.

CouchDB with continuous replication reverts document revision instead of deleting

We have a system that uses CouchDB as its database.
We are using continuous replication to create an always-updated copy of our database.
Recently we have discovered a strange behavior (maybe bug?) that I hope someone here could help me with:
We set the system with normal replication (NOT filtered).
We update the same document several times consecutively (each time waiting for CouchDB to return 200ok) - this part works fine and the document appears to be updated just fine in the replicated DB.
However, when we try to delete this document, even minutes after the consecutive updates, it is not deleted in the replication DB and instead just reverts to a revision before the consecutive updates.
It is important to note that we delete by adding a _deleted field set to true
I understand there is some problem with deletion using HTTP DELETE combined with filtered replication, but we're not using either.
Also, doing the same updates and just waiting a second between one and the other solves the problem just fine (or just combining them to one update).
However both solutions are not possible and at any case just go around the problem.
tl;dr:
1) CouchDB with normal continuous replication
2) Consecutive updates to document
3) _deleted = trueto document
4) Replicated DB does not delete, instead reverts to _rev before #2
Environment:
CouchDB version is 1.6.1
Windows computer
Using CouchDB-Lucene
Most likely you have introduced some conflicts in the documents. When a document is being edited in several replicas, CouchDB chooses a winning revision when replicating, but also keeps the losing revisions. If you delete the winning revision, the losing revision will be displayed again. You can read an introduction in the (now somewhat outdated) CouchDB Guide: http://guide.couchdb.org/draft/conflicts.html and in the CouchDB Docs: http://docs.couchdb.org/en/1.6.1/replication/conflicts.html
But in short, the replication database might have been edited by someone. It might be that you replicated several databases into one, or somebody edited the documents manually in the target database.
You can delete the target database and recreate an empty db. If you don't edit the target db by hand and don't replicate multiple dbs into one, _deletes will be replicated correctly from then.
Problem solved.
It was the revision limit.
Seems that quickly doing changes that are over the revision limit causes problem for the replication mechanism.
There is an unsolved bug in CouchDB about this issue:
https://issues.apache.org/jira/browse/COUCHDB-1649
Since the revision limit we had was 2, doing 3 consecutive updates to the same document and then deleting it caused this problem.
Setting the revision limit to 5 avoids it.

How many document revisions are kept in CouchDB / Cloudant, and for how long?

In CouchDB and Cloudant, when documents are changed, the database holds on to previous versions. What gets kept, and for how long?
Cloudant and CouchDB keep the document's metadata forever (id, rev, deleted and conflict). Document contents are deleted during compaction (automatic in Cloudant, manual in CouchDB), with one exception: in the case of a conflict, we'll keep the document contents until the conflict is resolved.
For each document, we keep the last X revisions, where X is the number returned by {username}.cloudant.com/{db}/_revs_limit, defaulting to 1000. Revisions older than the last 1000 get dropped. You can change _revs_limit by making a PUT request with a new value to that endpoint. For example:
curl -X PUT -d "1500" https://username.cloudant.com/test/_revs_limit
So, if a document is replicated to two nodes, edited 1001 times on node A, and then replicated again to node B, it will generate a conflict on node B (because we've lost the information necessary to join the old and new edit paths together).

Resources