CouchDB replicate not working about deleted document on target device - couchdb

I want to setup one-way replicate system.
like a backup system.
but in below senario, specific documents are not replicated to target.
setup replicate only device A to B. (A -> B one-way)
some documents will be replicated from A to B.
remove replicator document on B. (A !-> B)
remove some replicated documents on B.
resetup replicate device A to B. (A -> B one-way)
documents that deleted on B at 4. are not re-replicated from A to B.
I tried to _purge on device B, but result is same.
Is ther any way to resolve it?
I want to replicate force device A -> B.

This is the intended behavior of replicating changes with CouchDB. If you delete the document on B, it will not get re-replicated from A because the deletion that occurred on B is the more recent change.
So if you replicated from B to A instead, the document would be deleted on A. If you wanted to „resolve“ this, you'd need to write your own logic, e.g.:
back up all documents in A that have been deleted on B since the last replication
replicate from B to A
re-create the documents on A
replicate from A to B
_purge does not reset the changes feed, which is the source of information for which documents should be replicated.

Related

Surprising behavior with replicated, deleted CouchDb documents

We have two CouchDb servers, let's call them A and B. There's one-way replication from A to B and documents are only created, modified, or deleted on A - basically you can think of B as just a backup. There was a document on A that was deleted. When I tried to retrieve the revision prior to deletion from A I got {"error":"not_found","reason":"missing"} but that DB hasn't been compacted (at least as I understand it compaction only happens if you start it manually and that wasn't done). However, while B knew the document had been deleted the old revision was available on B.
My understanding is that if we haven't manually run compaction the old revision should always be available on A. Furthermore, when B replicates if there were multiple revisions since the last replication it'll pull metadata for the old revisions but might not pull the documents. Thus, in this setup, the set of revisions available on B should always be a proper subset of those available on A. So how could B have a revision that A does not?
We're on CouchDb 2.3.0.

How to re-replicate documents after they have been purged in remote database

I have an application where the documents available in a remote database are a subset of documents available on the server. When the subset required by the user changes, documents that are no longer needed in the remote database are purged (yes, purged, not deleted) and new documents replicated. If the subset required by the user was changed to include documents that have been previously purged, I can't find a way to make the purged documents replicate again to reinstate them on the client.
A simple scenario to consider is:
Create two databases, A and B
Create a document "D" in A
Replicate database A to B
In B, purge D
Replicate A to B again and notice that D is not replicated
I've tried compacting B, to no avail. I can understand that with continuous replication, D will not be sent again because it has not changed. But I can't get D to be re-replicated using one-time replication either. How can I make a replication copy D from A to B once CouchDB is in this state?
I'm using CouchDB 2.3.
CouchDB stores a local replication log on each node, when replication is done.
It's probably fetching this log, and picking up where it left off, thus ignoring changes that happened before the last replication (such as creating documents which are now purged).
I can think of two solutions to this:
Manually delete these replication logs, by looking for _local/ documents, and deleting them.
Change, even slightly, the replication parameters, so that CouchDB generates a new replication ID for the sake of logging. One way to do this would be to add a filter function (it could be a filter function that filters nothing).

N different versions of a single key that points to a json value with size M or N different keys with values of size M / N?

I want to ask which approach is better - to have N different versions of a single key that points to a json value with size M or to have N different keys with values of size M / N?
I`m using CouchDB as a state databse.
Example:
Single key with many versions(every value will be inserted after different chaincode invocation):
"singleKey:1" -> {"values":[v1]}
"singleKey:2" -> {"values":[v1, v2]}
"singleKey:3" -> {"values":[v1, v2, v3]}
...
"singleKey:m" -> {"values":[v1, v2, v3, ..., vm]}
Multiple keys with one version:
"key1:1" -> {"value":"v1"}
"key2:1" -> {"value":"v2"}
"key3:1" -> {"value":"v3"}
...
"keym:1" -> {"value":"vm"}
Are there some optimizations for persisting arrays in the ledger? For example to keep only the changes without copying everything.
Don't know if I understood your question correctly. But you have generally 2 approaches for doing this. But before going into the details, storing a single key with an array as a value with each version getting appended is a strict no-no.
This is because, when you modify the same key concurrently or in different transactions in the same block, you will surely end up with an MVCC_READ_CONFLICT error.
This is because Fabric uses Optimistic Locking for committing read/write sets.
Coming back to the approaches [Both approaches are StateDB agnostic, you can use Couch/goLevelDB]:
Approach 1:
If you need to use the version while fetching the value, store each key as a composite key
key1-ver1 -> val1
key1-ver2 -> val2
.. and so on
https://github.com/hyperledger/fabric/blob/release-1.2/core/chaincode/shim/interfaces.go#L128
https://github.com/hyperledger/fabric/blob/release-1.2/core/chaincode/shim/interfaces.go#L121
Approach 2:
If you do not need the version while fetching, just need to fetch the previous versions, then Fabric internally stores the history of modification of a key using its own mechanism. You can query this history using APIs of chaincodes.
https://godoc.org/github.com/hyperledger/fabric/core/chaincode/shim#ChaincodeStub.GetHistoryForKey
https://github.com/hyperledger/fabric/blob/release-1.2/core/chaincode/shim/interfaces.go#L161
You can have a look at the marbles example for an idea of both approaches:
https://github.com/hyperledger/fabric-samples/blob/release-1.2/chaincode/marbles02/go/marbles_chaincode.go

How Cassandra make sure consistency when adding a new node

I am little confused about how cassandra making sure consistency when add a new node to the cluster. I know cassandra will do the range movements and stream the data to new added node. Question is that does cassandra also stream the secondary replica's data to new added node.
For example, we have 4 nodes in the cluster with RF=3 (A,B,C,D)
A(x=1, y=2), B(x=1, y=3), C(x=1), D(y=2). Partition key "x" will hold by A,B,C, while partition key "y" will hold by D,A,B. If I add a new node A' between A and B. I think it will stream partition "x" from A. But does it also stream partition "y" from B or D?
If it does stream partition "y", which node will cassandra choose to streaming from? From the official document. It will stream from primary replica which is D. If that's the case, when D has stale data (it is ok before adding new node, as both A and B and latest data, which meets the quorum), after streaming, it is possible to query out stale data from D and A'. Am I right?
Cassandra will stream information from the node that is giving up ownership of the token
That is, in your example: RF=3 (A,B,C,D) A(x=1, y=2), B(x=1, y=3), C(x=1), D(y=2). If E is added between A, B and A will give up owning X to E and B will give up owning y. Then A will send its value of X to E and B will send its value of Y to E - so the end result will be A(y=2), E(X=1,y=3), B(x=1), C(x=1), D(y=2).
Please note that after adding the node A has a stale copy of X and B has a stale copy of Y and they should run 'nodetool cleanup' to get rid of that.
You are probably right. Running nodetool repair is recommended before adding a new node so that there is no inconsistency in the cluster.

Cloudant / couch db Two way replication - what prevents recursiveness?

We have two cloudant databases say A and B on two separate clusters. We have setup two way replication between these databases, so A->B and B->A.
1) If a document X is updated on A , it gets replicated to B. But this change on B is not replicated again back to A. So it does not get into indefinite recursive cycle. Is this achieved using the revision numbers ? I believe it might be internal to couchdb.
2) We need to figure out by looking at a document in both A and B , that which database actually received the update through API call and which one received the update through replication. Is there a way to figure this out ?
The CouchDB replication protocol is well defined and makes sure that the replication is done in a reliable manner.
CouchDB has no concept of a master. Once synced, all CouchDB instances are identical so it won't be possible to determine which node received the original request. If you need to do this, you probably should reevaluate whether replication is what you really want.

Resources