CouchDB _changes, view related - couchdb

Simple question: I would like to react to some changes in a database, but only to those changes which are causing modifications in a certain view1. That is, I am not interested in all changes in the database, just those changes which are affecting view1. I am not talking about filter here, just about view+changes. Something like this (although this is probably not correct):
http://localhost:5984/db/_design/doc1/_view/view1/_changes
Is this at all supported by CouchDB? Does this makes sense at all?

It's possible, but in a little another way. Since 1.1.0 release CouchDB is able to use map function as filters for changes feed. This works as like regular filters: if key-value pair was emitted at least once for changed document it means that he passes filter and _changes yields the record about him. If you need get only new updates for specific view, you need to specify staring since seq number - it could be easily retrieved from _design/ddoc-name/_info resource from field view_index/update_seq. Since 1.3 release you may also specify since=now to listen updates from current point of time.
Note, that this view filters doesn't uses view index and doesn't updates him while new changes occurs. Also, there is set of patches that improves view filters in the way that you may be also interested.

Related

DDD: How to save the order of aggregates?

I have the two Aggregates 'notebook' and 'note'.
When I use the role 'aggregates reference only by there ids', I think I have two options:
Notebook(List<NoteId>, [other properties])
Note([other properties])
or
Notebook([other properties])
Note(NotebookId, [other properties])
With the first option, I need two DB calls to show all notes of a notebook (one to get the list and the second to load the notes).
So my current favorite is the second option. Now I have few options in my mind to save the order of the notes, where anyone has some disadvantages.
What is a good approach to solve my problem? Or is the first option better and the two DB calls are negligible?
Can anybody help?
Big THX
It looks that the order of the Notes is important, at least related to the Notebook, so maybe it should be part of the domain. If yes, I would suggest to store it together with the Note. Or use some other information of the Note to give an ordering when a list is loaded.
If not, why is the order relevant? I mean, the two entities have a related but separated lifecycle, or at least it looks: one aggregate - the Notebook - has a list that only references the other - the Note. Hence no direct interaction is planned. But, given the the domain is correctly modelled (there's not enough information to say something about it), somewhere you need a ordered list of Notes. The only way to have it as you need it is to store the information (or use one already stored), otherwise the hypothesis (order is relevant) is not valid anymore.
update after infos about number of Notes and their size
It looks that your domain is organized in this way:
a root entity, the Notebook, where the order of each Note, with only its ID, is also stored: any change in the order will be updated from here, not from the Note
another root entity, the Note, with its own lifecycle and its own 'actions' (operations that trigger a change in the entity)
Whenever you load the Notebook, you must load also the Note and it's order to show it correctly ordered. On the other side, when you change the order, this structure allows you to have a single action (or operation) on the Notebook, for example changeOrder(NoteId), that updates the order of the given Note and, if needed, changes the order of all the others. The trick, here, is that when you persist the Notebook you work just with the ID of the Note, so you don't have to load all the entity, but just a part of it, update and save it again. So, how big is the Note entity is not important, because you don't use it all. Hence, at every change you could trigger an update of all the couples (NoteID, order) for that Notebook. You can't do differently. But, to support this you need a single function in the repository where you load the ID of the Note and its order and you save it again; that should be not so expensive.
On the other side, all the actions that operate directly on the Note should load it, hence you have to load all. But in this case is required to load all, and save all, because you are changing the Note itself.
Anyway, the way you persist the order is totally demanded to the persistence layer, that is built over the domain. I mean, the domain has a Notebook and a set of Notes with order 1, 2, 3, etc.
Even if I don't think that this needs such a complex solution, you could use a totally differen way to store the order: you can use for example steps of 100 (so 100, 200, 300, etc): each new Note is put in the middle of the old two ones, and is the only one to be saved each time. Every since a while you run a job, or something else, that just normalizes all the values restoring the 100 steps (or whatever you use to persit the order). As I said, this looks an overcomplicated solution to the problem, but it also shows the fact that the entities of the domain could be totally different from the Persitence ones.

How to track model changes nodejs/postgresql

I have a app perpetuating data in Postgresql/Express/Knex/Objection. I am looking for. way t track changes in my models, so that I can manage and revert versions similar to paper_trail in rails or this port for sequelize: https://github.com/nielsgl/sequelize-paper-trail
Is there something I could use for this in Knex/Objection or at the db level to track changes
Answer: There is not any generic way to do it in Objection nor knex.
Random rambling:
You need to design what kind of changes you like to track and write some code for example to Model hooks in objection how to track the changes.
One way to implement it would be for example by adding a separate table where all the tracked changes are written for example in JSONB object where updated fields or old values are stored and indexed or something like that. I'm pretty sure you don't want to add tracking of all the data in the database, since it will blow up the DB size very fast.
Anyways implementation depends what it is actually why you like or need to track the data and what are actual use cases that you need to support.
Also this might work for you: https://wiki.postgresql.org/wiki/Audit_trigger

CouchDB, how to get document changes only

Using /_changes?filter=_design I can get all the changes for design documents.
How do I get all the changes for documents only?
Is there such a thing like /_changes?filter=_docs_only ???
There is no built in filter for this. You will need to write your own filter function (http://couchdb.readthedocs.org/en/latest/couchapp/ddocs.html#filterfun) that excludes design documents (check the doc's _id for "_design/", etc.) from the feed. You then reference this filter function when you query the changes feed (http://couchdb.readthedocs.org/en/latest/api/database/changes.html?highlight=changes). However, most applications don't run into this too often since design documents are typically only updated when there is an application change.
It would probably be more efficient to implement this filter on the client side instead of streaming all your changes to the couchjs process (always inefficient). As your application loops through the changes simply check whether it is a design doc there.
Cheers.

Replicating pre-calculated views in CouchDB/Couchbase

When i first query a CouchDB/Couchbase view it needs to be calculated. This can take a good while if there are large number of docs and that for each single view..
Is there any way of replicate an already calculated view from one Couch to another?
Not directly through CouchDB replication, no, there's all sorts of practical complexities in how that would have to be implemented that make it impractical I'm afraid.
For starters it means that CouchDBs have to carefully manage replication of view calculation of changes simultaneously somehow exactly in sync with the actual data (so you don't ever get newer view calculations than data), and that then gets further complicated by the fact that views only get updated when requested, so view data on either end could be out of date (and if users are querying with stale=ok, it might even be required to stay out of date).
I believe you can do it by directly copying the view index files (in /var/lib/couchdb/.DBNAME_design/SOMEHASH.view by default I think), if you just need a once-off view sync. I'd recommend against doing that frequently as a general solution though, since it's not officially supported AFAIK and is likely to be pretty fragile.
This isn't directly the answer to your question, although as PimTerry pointed out, replicating the view index is not supported, especially between different implementation.
What you can do instead is follow the procedure described here:
http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment
This way you can have your couchdb calculate the new index "in background" without blocking the usage of your application.
Hope this helps.

How can I alter the incoming documents on replication in CouchDB

I need to replicate in CouchDB data from one database to another but in the process I want to alter the documents being replicated over,
mostly stripping out particular fields (but other applications mentioned in comments).
The replication would always be 100% one way (but other applications mentioned in comments could use bi-directional and sync)
I would prefer if this process did not increment their revision ID but that might be asking for too much.
But I don't see any of the design document functions that do what I am trying to do.
As it seems doesn't do this, what plans are there for adding this? And meanwhile, what workarounds are there?
No, there is no out-of-the-box solution, as this would defy the whole purpose and logic of multi-master, MVCC logic.
The only option I can see here is to create your own solution, but I would not call this a replication, but rather ETL (Extract, Transform, Load). And for ETL there are tools available that will let you do the trick, like (mixing open source and commercial here):
Scriptella
CloverETL
Pentaho Data Integration, or to be more specific Kettle
Jespersoft ETL
Talend have some tools as well
There is plenty more of ETL tools on the market.
I believe the best approach here would be to break out the fields you want to filter out into a separate document and then filter out the document during replication.
Of course the best way would be to have built-support for this, but a workaround which occurs to me would be, instead of here using the built-in replication, to code and use a custom replication which will do the additional needed alterations/transformations, still using rather than going beneith, the other built-ins, and with good coding, in many situations (especially if each master can push to its slaves), it feels this could be nearly as efficient.
This requires efficient triggers be put on each source/master to detect any changes, which I believe CouchDB does offer (or at least PouchDB appears to), which would then copy the changes to another location also doing the full alterations.
If the source of the change is unable to push the change to the final destination, this fixed store may to be local to it where the destination can pull from -- which could get pretty expensive especially in multi-master, as each location has to not only store & maintain its own data but also the data (being sent) of everyone it sends to.
This replicate would also place each source document's revision ID in the the document's copy...
...that is ideally, including essential if the copy was to be {updated, aka a master}, too.
...in form of either:
ideally the normal "_rev" property. Indeed this looks quite possible per it ("preserve their revisions ID") already done by the normal replication algorithm using the builtin "Bulk Docs API" which seemingly our varient would use, too
otherwise have a new copy object (with its own _rev) plus another field as "_rev_original" ntelling the original rev. But well that would work?
Clearly such copy could be created no problem.
Probably no big if the destination is just reading the data.
Seems hairy if the destination is also writing the data. As we'd now have to merge with these non-standard revisions. But doable.
Relevant to this (coding an a custom/improved replication (to do this apparently-missing functionality) ideally without altering Pouch and especially Couch source code), as starter/basis material (the standard method), here's the normal Couch replication algorithm which unfortunately doens't clearly say it only uses builtin ops but it looks like it, and also the official overview of what it does; I'm suspecting Pouch implements this, likely in Pouch's replicate.js (latest release as of 2014.07).
Futher implementation particulars? - those who would know, please put it here.
This is a "community wiki" answer so please extend it.
Also please comment links & details of anyone/system already doing or trying to do this or similar.

Resources