How can I disable auto compaction in couchdb 3.2?
I want to preserve all the history for a specific database.
Or completely disable auto compaction.
note) couchdb(3.2) configuration has changed from 2.0
I got an answer from github issue page.(https://github.com/apache/couchdb-documentation/issues/734)
Below is the contents.
We do not advise pausing compaction to preserve the history of database. Once database revisions reach past 1000 (_revs_limit) they will start to be removed anyway. Database history should be preserved by the application. When documents are updated, can create a separate document with the old contents and link them together with an ID.
Compaction can be disabled if there is an operational issue of some sort, There is a [smoosh.ignore] $dbshard = true configuration value which can be set for individual shards. For instance:
[smoosh.ignore]
shards/80000000-ffffffff/db1.1122445 = true
But you'll have to list all the db shards there.
If you want to disable all dbs compaction can try:
[smoosh]
db_channels = upgrade_dbs
Related
Say I have a database with 100 records, each with 1000 revisions, and an additional 100,000 deleted documents each with an extensive history of revisions. In addition we also have a view document and some mango indexes.
For this hypothetical situation let's assume I can't delete and rebuild the database. Also replication safety is not a concern.
If I am required to create some kind of script utilizing curl to purge the database of all unused data so that the result of running this script is exactly the same as deleting and rebuilding the database with only 100 records with a single revision on-file, how should I go about doing this?
For your hypothetical situation, you could do the following:
Make a backup of the 100 required documents
Delete all documents in the DB
Use the Purge API to delete the revision history
Re-Create the 100 required documents
A safer approach for saving disk space and BTree size in a real-life scenario would be:
Properly configure CouchDB's compaction settings to not include too many revisions
Only purge documents that won't ever be modified again in the future.
I understand that compaction of a db removes old revisions beyond the limit set in the config. The result is decreased disk usage, with little to no affect on view speed, because old revisions aren't part of the view index.
I recognize that view compaction is different from view cleanup, which removes unused view index files to save space.
However, what happens with a view compaction? I haven't been able to find much documentation on this, just that it is necessary. Does it operate similarly to db compaction in that it removes old revisions from design docs? If so, I don't think there is much of a benefit as design docs are usually small and few.
Views are structured similarly to databases, so when you make changes to documents there will be old revisions in your view index until you do a compaction, just like a database. The documentation doesn't state this explicitly, but it's implied by the "Views are also need compaction like databases" statement.
We have a system that uses CouchDB as its database.
We are using continuous replication to create an always-updated copy of our database.
Recently we have discovered a strange behavior (maybe bug?) that I hope someone here could help me with:
We set the system with normal replication (NOT filtered).
We update the same document several times consecutively (each time waiting for CouchDB to return 200ok) - this part works fine and the document appears to be updated just fine in the replicated DB.
However, when we try to delete this document, even minutes after the consecutive updates, it is not deleted in the replication DB and instead just reverts to a revision before the consecutive updates.
It is important to note that we delete by adding a _deleted field set to true
I understand there is some problem with deletion using HTTP DELETE combined with filtered replication, but we're not using either.
Also, doing the same updates and just waiting a second between one and the other solves the problem just fine (or just combining them to one update).
However both solutions are not possible and at any case just go around the problem.
tl;dr:
1) CouchDB with normal continuous replication
2) Consecutive updates to document
3) _deleted = trueto document
4) Replicated DB does not delete, instead reverts to _rev before #2
Environment:
CouchDB version is 1.6.1
Windows computer
Using CouchDB-Lucene
Most likely you have introduced some conflicts in the documents. When a document is being edited in several replicas, CouchDB chooses a winning revision when replicating, but also keeps the losing revisions. If you delete the winning revision, the losing revision will be displayed again. You can read an introduction in the (now somewhat outdated) CouchDB Guide: http://guide.couchdb.org/draft/conflicts.html and in the CouchDB Docs: http://docs.couchdb.org/en/1.6.1/replication/conflicts.html
But in short, the replication database might have been edited by someone. It might be that you replicated several databases into one, or somebody edited the documents manually in the target database.
You can delete the target database and recreate an empty db. If you don't edit the target db by hand and don't replicate multiple dbs into one, _deletes will be replicated correctly from then.
Problem solved.
It was the revision limit.
Seems that quickly doing changes that are over the revision limit causes problem for the replication mechanism.
There is an unsolved bug in CouchDB about this issue:
https://issues.apache.org/jira/browse/COUCHDB-1649
Since the revision limit we had was 2, doing 3 consecutive updates to the same document and then deleting it caused this problem.
Setting the revision limit to 5 avoids it.
I have an application pulls CouchDB from the first doc to the latest one, batch by batch.
I tried compact my database from 1.7GB to 1.0GB, and /db/_changes seems the same.
Can anyone please clarify if CouchDB compaction affects /db/_changes ?
All compaction does is remove old references to documents in a given database. The changes feed deals exclusively with write operations, which are unaffected by compaction. (since those writes have already happened)
Now, it should be noted that the changes feed will give you the rev numbers as well. Upon compaction, all but the most recent rev are deleted, so those entries in the changes feed will have "dead" links. (so-to-speak)
See the docs for more information about compaction.
Taking a simple CouchDB to a theory that CouchDB compaction is totally indifferent to deleted docs.
Deleting a doc from couch via a DELETE method yields the following when trying to retrieve it:
localhost:5984/enq/deleted-doc-id
{"error":"not_found","reason":"deleted"}
Expected.
Now I compact the database:
localhost:5984/enq/_compact
{'ok': true }
And check compaction has finished
"compact_running":false
Now I would expect CouchDB to return not_found, reason "missing" on a simple GET
localhost:5984/enq/deleted-doc-id
{"error":"not_found","reason":"deleted"}
And trying with ?rev=deleted_rev gives me a ful doc, yeah for worthless data.
So am I correct in thinking the couchdb compaction shows no special treatment for deleted docs and simple looks at the rev count again rev limit when deciding what is part of compaction. Is there a special rev_limit we can set for deleted docs?
Surely the only solution can't be a _purge? at the moment we must have thousands of orphaned deleted docs, and whilst we want to maintain some version history for normal docs we dont want to reduce our rev_limit to 1 to assist in this scenario
What are the replication issues we should be aware of with purge?
Deleted documents are preserved forever (because it's essential to providing eventual consistency between replicas). So, the behaviour you described is intentional.
To delete a document as efficiently as possible use the DELETE verb, since this stores only _id, _rev and the deleted flag. You can, of course, achieve the same more manually via POST or PUT.
Finally, _purge exists only for extreme cases where, for example, you've put an important password into a couchdb document and need it be gone from disk. It is not a recommended method for pruning a database, it will typically invalidate any views you have (forcing a full rebuild) and messes with replication too.
Adding a document, deleting it, and then compacting does not return the CouchDB database to a pristine state. A deleted document is retained through compaction, though in the usual case the resulting document is small (just the _id, _rev and _deleted=true). The reason for this is replication. Imagine the following:
Create document.
Replicate DB to remote DB.
Delete document.
Compact DB.
Replicate DB to remote DB again.
If the document is totally removed after deletion+compaction, then the second replication won't know to tell the remote DB that the document has been deleted. This would result in the two DBs being inconsistent.
There was an issue reported that could result in the document in the DB not being small; however it did not pertain to the HTTP DELETE method AFAIK (though I could be wrong). The ticket is here:
https://issues.apache.org/jira/browse/COUCHDB-1141
The basic idea is that audit information can be included with the DELETE that will be kept through compaction. Make sure you aren't posting the full doc body with the DELETE method (doing so might explain why the document isn't actually removed).
To clarify... from our experience you have to kick of a DELETE with the id and a compact in order to fully remove the document data.
As pointed out above you will still have the "header data" in your database afterwards.