Why is the time stamp not used for conflict resolution? - couchdb

The docs state that the "winner" is chosen in an arbitrary and consistent fashion. Why is the "later" revision not always chosen? I am sure that the timestamp is present in the payload.

There are a couple add-ons that implement this. Unfortunately not well advertised.
https://github.com/automerge/automerge
https://github.com/redgeoff/delta-pouch
This issue has some discussion:
https://github.com/pouchdb/pouchdb/issues/8163
Looks like pouchdb folks do not read SO.

Related

Mongodb change streams getting previous values?

Recently I learned about change streams in mongodb and how powerful they are, but I’m in need of getting the previous values before an update. However, I seem to have learned through some research that it’s impossible to get the previous values? So this got me thinking about what alternatives exist in order to retrieve those previous values?
What I want to achieve is a logging system such as, “Record A field has changed from {old_value} to {new_value}.” I’m using socket.io to push these updates to a react front-end client. The updates to records would be happening from a completely different system and not on the same blackened server where the change streams would be listening so I won’t be able to query the document before updating.
So I started to think of a different solution…maybe I could have two databases? One contains the old records and the other the updated records but this sounds like a duplication of data. And I can’t imagine having thousands of records.
I need some guidance as I really don’t know what the best option is? Is there really no way you can use change streams and get the previous values? Is it possible to somehow query the document before a change stream event? Thank you.
Not sure how I missed this but the solution is versioning the data.

Scratch couchdb document

Is it possible to "scratch" a couchdb document? By that I mean to delete a document, and make sure that the document and its history is completely removed from the database.
I do not want to perform a database compaction, I just want to fully wipe out a single document. And I am looking for a solution that guarantees that there is no trace of the document in the database, without needing to wait for internal database processes to eventually remove the document.
(a python solution is appreciated)
When you delete a document in CouchDB, generally only the _id, _rev, and a deleted flag are preserved. These are preserved to allow for eventual consistency through replication. Forcing an immediate delete across an entire group of nodes isn't really consistent with the architecture.
The closest thing would be to use purge; once you do that, all traces of the doc will be gone after the next compaction. I realize this isn't exactly what you're asking for, but it's the closest thing off the top of my head.
Here's a nice article explaining the basis behind the various delete methods available.
Deleting anything from file system for sure is difficult, and usually quite expensive problem. Even more with databases in general. Depending of what for sure means to you, you may end up with custom db, custom os and custom hw. So it is kind of saying I want fault tolerant system, yes everyone would like to have one, but only few can afford it, but good news is that most can settle for less. Similar is for deleteing for sure, I assume you are trying to adress some security or privacy issue, so try to see if there is some other way to get what you need. Perhaps encrypting the document or sensitive parts of it.

Replicating pre-calculated views in CouchDB/Couchbase

When i first query a CouchDB/Couchbase view it needs to be calculated. This can take a good while if there are large number of docs and that for each single view..
Is there any way of replicate an already calculated view from one Couch to another?
Not directly through CouchDB replication, no, there's all sorts of practical complexities in how that would have to be implemented that make it impractical I'm afraid.
For starters it means that CouchDBs have to carefully manage replication of view calculation of changes simultaneously somehow exactly in sync with the actual data (so you don't ever get newer view calculations than data), and that then gets further complicated by the fact that views only get updated when requested, so view data on either end could be out of date (and if users are querying with stale=ok, it might even be required to stay out of date).
I believe you can do it by directly copying the view index files (in /var/lib/couchdb/.DBNAME_design/SOMEHASH.view by default I think), if you just need a once-off view sync. I'd recommend against doing that frequently as a general solution though, since it's not officially supported AFAIK and is likely to be pretty fragile.
This isn't directly the answer to your question, although as PimTerry pointed out, replicating the view index is not supported, especially between different implementation.
What you can do instead is follow the procedure described here:
http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment
This way you can have your couchdb calculate the new index "in background" without blocking the usage of your application.
Hope this helps.

How can I alter the incoming documents on replication in CouchDB

I need to replicate in CouchDB data from one database to another but in the process I want to alter the documents being replicated over,
mostly stripping out particular fields (but other applications mentioned in comments).
The replication would always be 100% one way (but other applications mentioned in comments could use bi-directional and sync)
I would prefer if this process did not increment their revision ID but that might be asking for too much.
But I don't see any of the design document functions that do what I am trying to do.
As it seems doesn't do this, what plans are there for adding this? And meanwhile, what workarounds are there?
No, there is no out-of-the-box solution, as this would defy the whole purpose and logic of multi-master, MVCC logic.
The only option I can see here is to create your own solution, but I would not call this a replication, but rather ETL (Extract, Transform, Load). And for ETL there are tools available that will let you do the trick, like (mixing open source and commercial here):
Scriptella
CloverETL
Pentaho Data Integration, or to be more specific Kettle
Jespersoft ETL
Talend have some tools as well
There is plenty more of ETL tools on the market.
I believe the best approach here would be to break out the fields you want to filter out into a separate document and then filter out the document during replication.
Of course the best way would be to have built-support for this, but a workaround which occurs to me would be, instead of here using the built-in replication, to code and use a custom replication which will do the additional needed alterations/transformations, still using rather than going beneith, the other built-ins, and with good coding, in many situations (especially if each master can push to its slaves), it feels this could be nearly as efficient.
This requires efficient triggers be put on each source/master to detect any changes, which I believe CouchDB does offer (or at least PouchDB appears to), which would then copy the changes to another location also doing the full alterations.
If the source of the change is unable to push the change to the final destination, this fixed store may to be local to it where the destination can pull from -- which could get pretty expensive especially in multi-master, as each location has to not only store & maintain its own data but also the data (being sent) of everyone it sends to.
This replicate would also place each source document's revision ID in the the document's copy...
...that is ideally, including essential if the copy was to be {updated, aka a master}, too.
...in form of either:
ideally the normal "_rev" property. Indeed this looks quite possible per it ("preserve their revisions ID") already done by the normal replication algorithm using the builtin "Bulk Docs API" which seemingly our varient would use, too
otherwise have a new copy object (with its own _rev) plus another field as "_rev_original" ntelling the original rev. But well that would work?
Clearly such copy could be created no problem.
Probably no big if the destination is just reading the data.
Seems hairy if the destination is also writing the data. As we'd now have to merge with these non-standard revisions. But doable.
Relevant to this (coding an a custom/improved replication (to do this apparently-missing functionality) ideally without altering Pouch and especially Couch source code), as starter/basis material (the standard method), here's the normal Couch replication algorithm which unfortunately doens't clearly say it only uses builtin ops but it looks like it, and also the official overview of what it does; I'm suspecting Pouch implements this, likely in Pouch's replicate.js (latest release as of 2014.07).
Futher implementation particulars? - those who would know, please put it here.
This is a "community wiki" answer so please extend it.
Also please comment links & details of anyone/system already doing or trying to do this or similar.

Alternative Data Access pattern to Repository

I have certain objects in my domain which are not aggregate roots/entities, yet I still need to retrieve them from a database. I don't want to confuse things by creating repositories for these things. So, what are alternative data access patterns? Would you simply create a DAO for them, while still of course separating the interface?
Edit:
Some more detail on what I'm doing. I need to create a code. This code has certain rules as to its format. One of the rules is that the final character must be a unique number incremented by one from the last code generated. For example:
ABCD1
ABCD2
ABCD3
So, I'm keeping a table with one row, one column to store the number in question. Now, I don't want to consider this number an entity and create a repository for it - that's overkill. I just need a way of retrieving the number, adding 1 to it, and saving it. I know there are myriad ways I could do it, but I'm wondering if there's an customary way.
There are several data access patterns that could apply, in theory. You'd need to provide more detail though if you want us to suggest a specific pattern.
Without more detail, all I can suggest is to consider looking into Martin Fowler's Patterns of Enterprise Application Architecture book.
Edit: Customary way? No, not that I can think of - it really depends on where and how you're using this unique code in your domain. If I were doing this, I'd probably create a small service that speaks directly to the database to perform this function - not as heavy-weight as a repository, and very focused on the problem at hand.
Based on the edit: I would look first at the context in which you need to create that code. Perhaps there are some related entities or something that you are missing.
btw, I find the question really interesting as it comes up from time to time while coding specific features. I usually end up finding I was missing something on the scenario and it ends up fitting well with the normal repository pattern.
After surveying the options I'm going with the Table Gateway pattern.

Resources