In the CouchDB _changes response, why are the "changes" elements arrays? - couchdb

The response from CouchDB to a _changes request comes back in this format:
{"seq":12,"id":"foo","changes":[{"rev":"1-23202479633c2b380f79507a776743d5"}]}
My question - why is the "changes" element an array? What scenario would return more than one item in the changes element? I have never seen an example online with more than one item, and in my own experience I have only seen one item.
I'm writing code that interacts with changes, and I'd like to understand what to do if, in fact, there were more than one item.
Thanks,
Mike

The changes element is an array to reflect all existed revision leafs for document. As you know, CouchDB doesn't deletes document completely, but sets gravestone instead to prevent his accidentally resurrection after replication from source that has his older revision that wasn't yet deleted. Also it's possible to have multiple leafs due to update conflicts that have occurs after replication. For example:
Mike have created document in database A and replicated him to database B:
{"results":[
{"seq":1,"id":"thing","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]}
],
"last_seq":1}
John have received your document and updated him in database B:
{"results":[
{"seq":2,"id":"thing","changes":[{"rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}]}
],
"last_seq":2}
But at the same Mike also made few changes (forgot to clean up data or add for something important) for him in database A :
{"results":[
{"seq":2,"id":"thing","changes":[{"rev":"2-13839535feb250d3d8290998b8af17c3"}]}
],
"last_seq":2}
And replicated him again to database B. John receives document in conflict state and by looking at changes feed with query parameter style=all_docs see next result:
{"results":[
{"seq":3,"id":"thing","changes":[{"rev":"2-7051cbe5c8faecd085a3fa619e6e6337"},{"rev":"2-13839535feb250d3d8290998b8af17c3"}]}
],
"last_seq":3}
While direct access to document returns his data from wining revision (with higher seq number or just latest) it's possible for him to have many of conflicted revisions (imagine concurrent writes for single document it within dozen databases that are replicated between each other)
Now John had decided to resolve this conflict and update actual revision, but drop the other:
{"results":[
{"seq":4,"id":"thing","changes":[{"rev":"3-2502757951d6d7f61ccf48fa54b7e13c"},{"rev":"2-13839535feb250d3d8290998b8af17c3"}]}
],
"last_seq":4}
Wait, the Mike's revision is still there? Why? John in panic removes his document:
{"results":[
{"seq":5,"id":"thing","changes":[{"rev":"2-13839535feb250d3d8290998b8af17c3"}{"rev":"4-149c48caacb32c535ee201b6f02b027b"}]}
],
"last_seq":5}
Now his version of document is deleted, but he's able to access to Mike's one.
Replicating John changes from database B to database A will all brings tombstone:
{"results":[
{"seq":3,"id":"thing","changes":[{"rev":"3-2adcbbf57013d8634c2362630697aab6"},{"rev":"4-149c48caacb32c535ee201b6f02b027b"}]}
],
"last_seq":3}
Why so? Because this is the document history about his data "evolution": in real world your documents may have many intermediate leafs distributed among large number of databases and to prevent silent data overwrite due data replication process CouchDB keeps each leaf to help resolve such conflicts. More and probably better explanation you may found in CouchDB wiki about replication and conflicts. Changes feed query parameters are also described there as well.

Related

How to re-replicate documents after they have been purged in remote database

I have an application where the documents available in a remote database are a subset of documents available on the server. When the subset required by the user changes, documents that are no longer needed in the remote database are purged (yes, purged, not deleted) and new documents replicated. If the subset required by the user was changed to include documents that have been previously purged, I can't find a way to make the purged documents replicate again to reinstate them on the client.
A simple scenario to consider is:
Create two databases, A and B
Create a document "D" in A
Replicate database A to B
In B, purge D
Replicate A to B again and notice that D is not replicated
I've tried compacting B, to no avail. I can understand that with continuous replication, D will not be sent again because it has not changed. But I can't get D to be re-replicated using one-time replication either. How can I make a replication copy D from A to B once CouchDB is in this state?
I'm using CouchDB 2.3.
CouchDB stores a local replication log on each node, when replication is done.
It's probably fetching this log, and picking up where it left off, thus ignoring changes that happened before the last replication (such as creating documents which are now purged).
I can think of two solutions to this:
Manually delete these replication logs, by looking for _local/ documents, and deleting them.
Change, even slightly, the replication parameters, so that CouchDB generates a new replication ID for the sake of logging. One way to do this would be to add a filter function (it could be a filter function that filters nothing).

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.
MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

How many document revisions are kept in CouchDB / Cloudant, and for how long?

In CouchDB and Cloudant, when documents are changed, the database holds on to previous versions. What gets kept, and for how long?
Cloudant and CouchDB keep the document's metadata forever (id, rev, deleted and conflict). Document contents are deleted during compaction (automatic in Cloudant, manual in CouchDB), with one exception: in the case of a conflict, we'll keep the document contents until the conflict is resolved.
For each document, we keep the last X revisions, where X is the number returned by {username}.cloudant.com/{db}/_revs_limit, defaulting to 1000. Revisions older than the last 1000 get dropped. You can change _revs_limit by making a PUT request with a new value to that endpoint. For example:
curl -X PUT -d "1500" https://username.cloudant.com/test/_revs_limit
So, if a document is replicated to two nodes, edited 1001 times on node A, and then replicated again to node B, it will generate a conflict on node B (because we've lost the information necessary to join the old and new edit paths together).

Delete all documents in a CouchDB database *except* the design documents

Is it possible to delete all documents in a couchdb database, except design documents, without creating a specific view for that?
My first approach has been to access the _all_docs standard view, and discard those documents starting with _design. This works but, for large databases, is too slow, since the documents need to be requested from the database (in order to get the document revision) one at a time.
If this is the only valid approach, I think it is much more practical to delete the complete database, and create it from scratch inserting the design documents again.
I can think of a couple of ideas.
Use _all_docs
You do not need to fetch all the documents, only the ID and revisions. By default, that is all that _all_docs returns. You can make a pretty big request in a batch (10k or 100k docs at a time should be fine).
Replicate then delete
You could use an _all_docs query to get the IDs of all design documents.
GET /db/_all_docs?startkey="_design/"&endkey="_design0"
Then replicate them somewhere temporary.
POST /_replicator
{ "source":"db", "target":"db_ddocs", "create_target":true
, "user_ctx": {"roles":["_admin"]}
, "doc_ids": ["_design/ddoc_1", "_design/ddoc_2", "etc..."]
}
Now you can just delete the original database and replicate the temporary one back by swapping the "source" and "target" values.
Deleting vs "deleting"
Note, these are really apples vs. oranges techniques. By deleting a database, you are wiping out the edit history of all its documents. In other words, you cannot replicate those deletion events to any other database. When you "delete" a document in CouchDB, it stores a record of that deletion. If you replicate that database, those deletions will be reflected in the target. (CouchDB stores "tombstones" indicating the document ID, its revision history, and its deleted state.)
That may or may not be important to you. The first idea is probably considered more "correct" however I can see the value of the second. You can visualize the entire program to accomplish this in your head. It's only a few queries and you're done. No looping through _all_docs batches, no headache. Your specific situation will probably make it obvious which is better.
Install couchapp, pull down the design doc to your hard disk, delete the db in futon, push the design doc back up to your recreated database. =)
You could write a shell script that goes through the list of all documents and deletes them all one by one except design docs. Apparently couch-batch can do that. Note that you don't need to fetch the whole docs to do that, just the id and revision.
Other than that, I think filtered replication (or the replication proposed by JasonSmith) is your best bet.

CouchDB Compaction and Doc Deletion - Compaction indifferent?

Taking a simple CouchDB to a theory that CouchDB compaction is totally indifferent to deleted docs.
Deleting a doc from couch via a DELETE method yields the following when trying to retrieve it:
localhost:5984/enq/deleted-doc-id
{"error":"not_found","reason":"deleted"}
Expected.
Now I compact the database:
localhost:5984/enq/_compact
{'ok': true }
And check compaction has finished
"compact_running":false
Now I would expect CouchDB to return not_found, reason "missing" on a simple GET
localhost:5984/enq/deleted-doc-id
{"error":"not_found","reason":"deleted"}
And trying with ?rev=deleted_rev gives me a ful doc, yeah for worthless data.
So am I correct in thinking the couchdb compaction shows no special treatment for deleted docs and simple looks at the rev count again rev limit when deciding what is part of compaction. Is there a special rev_limit we can set for deleted docs?
Surely the only solution can't be a _purge? at the moment we must have thousands of orphaned deleted docs, and whilst we want to maintain some version history for normal docs we dont want to reduce our rev_limit to 1 to assist in this scenario
What are the replication issues we should be aware of with purge?
Deleted documents are preserved forever (because it's essential to providing eventual consistency between replicas). So, the behaviour you described is intentional.
To delete a document as efficiently as possible use the DELETE verb, since this stores only _id, _rev and the deleted flag. You can, of course, achieve the same more manually via POST or PUT.
Finally, _purge exists only for extreme cases where, for example, you've put an important password into a couchdb document and need it be gone from disk. It is not a recommended method for pruning a database, it will typically invalidate any views you have (forcing a full rebuild) and messes with replication too.
Adding a document, deleting it, and then compacting does not return the CouchDB database to a pristine state. A deleted document is retained through compaction, though in the usual case the resulting document is small (just the _id, _rev and _deleted=true). The reason for this is replication. Imagine the following:
Create document.
Replicate DB to remote DB.
Delete document.
Compact DB.
Replicate DB to remote DB again.
If the document is totally removed after deletion+compaction, then the second replication won't know to tell the remote DB that the document has been deleted. This would result in the two DBs being inconsistent.
There was an issue reported that could result in the document in the DB not being small; however it did not pertain to the HTTP DELETE method AFAIK (though I could be wrong). The ticket is here:
https://issues.apache.org/jira/browse/COUCHDB-1141
The basic idea is that audit information can be included with the DELETE that will be kept through compaction. Make sure you aren't posting the full doc body with the DELETE method (doing so might explain why the document isn't actually removed).
To clarify... from our experience you have to kick of a DELETE with the id and a compact in order to fully remove the document data.
As pointed out above you will still have the "header data" in your database afterwards.

Resources