CouchDB document replication(updating specific attributes of a document) - couchdb

I have an issue of replication and I need your help in it.In couchDb replication,I want to replicate in such a way that during Couchdb replication I want to reset/update some specific attributes of a a document for some purpose and then these edited documents should be saved in replicated db without effecting the original ones.For example:
A document named Student with attributes id,name,class etc.
And I want to replicate this document in the way that its name and class should be reset/updated.
Will you please tell me how can I achieve it.
Thanks.

You can't update docs during the replication.
But you can exclude docs from being replicated with the help of a CouchDB filter (e.g. preventing all docs with a revision higher then 1 from being replicated).
If you want to have multiple versions of the same dataset (e.g. to have dataset revisions) - i use the term "dataset" instead of "doc" to clearly express that not the internal CouchDB doc revision handling is involved - you have to store them as separated docs that have all a unique id and a reference property like original: "UUID_of_the_original".
you can't use the CouchDB doc revision handling for that purpose (thats what many people think when they see the _rev property in the docs)

Related

CouchDB check if a document exists in a validation function

I would like to see if a document exists in the database that has the name field "name" set to "a name" before allowing a new document to be added to the database.
I this possible in CouchDB using update handlers (inside design documents)?
Seems you are looking for a unique constraint in CouchDB. The only unique constraint supported by CouchDB is based on the document ID.
You should include your "name" attribute value into the document ID if you would like to have the document unicity based on it.
Validate document update functions defined in desing documents can only use the data of the document being created/updated/deleted, it can no use data from other documents in the database.
Yo can find a similar question here.
This is not widely known, but _update endpoint allowed to return a doc with _id prop different from requested. It means, in your case, you need to have an unique document say _id:"doc-name", which will serve as a constraint.
Then you call smth like POST _design/whatever/_update/saveDependentDoc/doc-name, providing new doc with different _id as a request body.
Your _update function will effectively receive two docs as an input (or null and newDoc if constraint doc is missing). The function then decides what should it do: return received doc to persist it, or return nothing.
The solution isn’t a full answer to your question, however it might be helpful in some cases.
This trick only works for updating existing docs if you know revision, for sure.

Updating dbcopy database when parent MapReduce View Changes

I have a database called "development-records" that has a MapReduce view with a "dbcopy" declaration that creates a view in a new database called "development-chained".
When we make an update the view in "development-records", we do the usual steps of:
1. Create a duplicate copy of the design document that we want to change, for example by adding _OLD to its name: _design/fetch_OLD.
2. Put the new or 'incoming' design document into the database, using a name with the suffix _NEW: _design/fetch_NEW.
3. Query the fetch_NEW view, to ensure that it starts to build.
4. Poll the _active_tasks endpoint and wait until the index has finished building.
5. Put a duplicate copy of the new design document into _design/fetch.
6. Delete Design Document _design/fetch_NEW.
7. Delete Design Document _design/fetch_OLD.
The problem is that the documents specified in the dbcopy database "development-chained" don't seem to be updated -- all the old records stay. Is there a way to trigger the dbcopy database to perform the MapReduce again?
Unfortunately, according to the official Cloudant documentation, "The dbcopy feature can cause problems under some circumstances." Use of this feature is strongly discouraged, and has otherwise been removed from the documentation. I hope knowing that helps a little. The new documentation is hard to find.

Can data in Solr be extended with manually defined meta data?

I have several documents in a solr collection that I want to be able to search through. Most of the data comes from web sites I can easily crawl, however, I need to add some attributes manually to because I have to add these attributes manually.
So as an example I get the following info from a site (all attributes returned from crawled site):
Name: Porsche Boxter
Year: 1996
...
I want to add additional fields through a web interface (info not present on crawled sites):
Cool: yes
foo: bar
My questions:
Does it make sense at all to store additional information along the indexed data within Solr (inside the documents) or would a best practice only have all crawled data in Solr and merge with an external managed database during query time? To me it makes more sense to have all my data that is eventually queried in Solr as some of the manually added attributed are required search criteria (e.g. look only for cool cars from the 90s).
Is it possible to use Solr to store additional information about indexed documents? I know the entire schema in advance, perhaps this is useful?
If I store my data exclusively in Solr, how can I ensure that during the next crawl the manually added data is not overwritten? Would partial update be required?
Since I am new to Solr it would also be very helpful if someone could simply manage what to look for in the documentation that describes my use case.
That depends on how often the external data changes. The more often, the less meaningful. Generally it is a good idea to store such data along the index data, because you get them without an additional database query.
Yes. Use indexed:falseand stored:true. If you knew not know all of such fields in advance you could use a dynamicField like <dynamicField name="*_stored" type="string" indexed="false" stored="true" />.
Yes. You have to use partial update. This is no problem in your case, because the fields not updated have stored:true.

How can I "undelete" a set of documents in CouchDB?

I have a large set of documents in a CouchDB database that were just accidentally bulk deleted using _deleted:true. I also have a backup for this set of data that includes their last known good revision and metadata. I need to maintain the same _id, so simple restore with a new _id is not an option.
Compaction has not been run and I can access any of these documents via the &rev= url parameter as well as their attachments (which are needed).
What I need to do is "restore" these documents to the revision I have on file. Surprisingly, I have come up empty with any queries on how to achieve this. Tips or hacks appreciated.
If you just PUT the whole document, including the attachment stub, back into the DB, with the deleted rev, but less the _deleted:true parameter, then all will be well.

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources