Find differences between current document and previous revision - couchdb

Is there a way to determine what changes were made in a document? Here's a document and a revision of it
{
"_id": "panel100000",
"_rev": "1-b4f55d0e03fbfaef0822a0607d5d6ad0",
"name": "Maya Jambalaya",
"maritalstatus": "Married",
"employed": "Full time",
"education": "College graduate"
}
{
"_id": "panel100000",
"_rev": "2-caab684a341da5185546a028cfb5b0d9",
"name": "Maya Papaya",
"maritalstatus": "Married",
"employed": "Full time",
"education": "College graduate"
}
In this example, name and maritalstatus have changed. Is there a way to find changes between a document and its previous revisions?
Is there anything built-in that does or could track such changes?
Is it possible to access a document's revision via a design document?
If the answer to #2 is "yes" then does anyone have a template of a design document with which to compare them?

No. If you want to track changes, you would probably need to use a data model adapted for that purpose. Otherwise, Couch keeps revisions of the documents and you can query them to manually calculate the diff. Therefore, there are no guarantees that revisions will not be compacted.
No. Design documents are built with the latest revision of each documents.
...
If you want to be sure to keep every document changes, you would need to create a document for each change. Those changes could be grouped by a uniqueId and you could use a map/reduce to get the latest value of a document. The diff would need to be made manually tho. The advantage would be that you can easily get the state of the document at a certain time.

Related

Couchdb how to get the timestamp of the last modified database?

Does anyone know how to get the last timestamp that a specific database was modified?
The API _changes does not provide that information. Thank you.
UPDATE
How to retrieve the last date /time that the database had anew document inserted or a modified one.
CouchDB does not record the time that each change occurred, so if you need this functionality you need to a add a timestamp into the document e.g.
{
"_id": "myid",
"name": "Bob",
"email": "bob#aol.com",
"timestamp": 1657614546263
}
Then a MapReduce view will allow you to query documents by timestamp:
function(doc) {
emit(doc.timestamp)
}
To get a the latest change you would query the resultant view with ?descending=true&limit=1 to get the most recently modified document:
GET /mydb/_design/myview/_view/myview?descending=true&limit=1&include_docs=true`
Alternatively, you can use a document id that has a timestamp encoded within it. See this blog post which shows how documents with time-sortable ids allow easy querying of the latest documents to be added.

CouchDB searching linked documents

I'm very new to couchdb and i'm hoping someone can help me with a solution to this problem.
Say I have an address document that contains various keys, but importantly a singleLineAddress and a persons array:
{
"_id": "002cb726bfe69a79ed9b897931000ec6",
"_rev": "2-6af6d8896703e9db6f5ba97abb1ca5d7",
"type": "address",
...
"singleLineAddress": "28 CLEVEDON ROAD, WESTON-SUPER-MARE, BS23 1DG",
...
"persons":["d506d09a1c46e32f6632e6d99a0062bd","002cb726bfe69a79ed9b897931001c80"]
}
Then i have a person document with a number of keys, crucially with firstName & lastName:
{
"_id": "d506d09a1c46e32f6632e6d99a0062bd",
"_rev": "4-98fae966a92d5c6c359cb8ddfaa487e1",
"type": "person",
...
"firstName": "Joe",
"lastName": "Bloggs"
...
}
I understand I can created a linked document view and emit all the person id's linked to address, then I can use include_docs=true to see all the person data. But, from what i'm reading it's not advised to use include_docs=true as it can be expensive.
Ultimately, i'd like to use couchdb-lucene to run a FTS against person # address using the name & address. Is that even possible using linked documents?
Using ?include_docs=true is more expensive than not using it - for every row of the index returned, the database has to fetch the related document body. But sometimes needs must :) You can avoid using ?include_docs=true by "projecting" more data into the index which is returned to you at query time. See https://blog.cloudant.com/2021/11/12/Projection.html
As for Lucene full-text searching, you can certainly search across document types in the same collection but your search results would consist of a mixture of address and people documents - full text searching can't do the "join" between an address and its occupants - you'd have to do that yourself later.
If you desperately need to return address and people objects together, then consider combining the two: your address document would contain an array of people objects that reside there? There is a trade off between combining objects such that data the belongs together is stored together, and keeping every micro object separate for ease of updating.

How to update fields by query in Azure search?

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.
I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

Why are there two ways to update documents?

As a CouchDB beginner I'm having a hard time understanding how to update documents.
When I read the docs I find this which is quite confusing for me:
1) Updating an Existing Document
To update an existing document you must specify the current revision number within the _rev parameter.
Source: Chapter 10.4.1 /db/doc
2) Update Functions
Update handlers are functions that clients can request to invoke server-side logic that will create or update a document.
Source: Chapter 6.1.4 Design Documents
Could you please tell me which way do you prefer to update your documents?
Edit 1:
Let's say the data structure is just a simple car document with some basic fields.
{
"_id": "123",
"name": "911",
"brand": "Porsche",
"maxHP": "100",
"owner": "Lorna"
}
Now the owner changes, would you still use option 1? Option 1 has quite a downside, because I can't just edit one field. I need to retrieve every fields first, edit just the owner field and than send back the whole document. I just tried it and I find this quite long-winded. Hmmm...
Most of the time you want to choose option 1 "Update an existing document"; this operates on a standard document that stores data in the database. The other option relates to design documents, such as views (which are also documents, this is definitely confusing to new CouchDB users), which is something completely different.
Stick with option 1, and good luck :)

couchdb match multiple inconsistent keys

Considering the following two documents:
{
"_id": "a6b8d3d7e2d61c97f4285220c103abca",
"_rev": "7-ad8c3eaaab2d4abfa01abe36a74da171",
"File":"/store/document/scan_bgd123.jpg",
"Commend": "Describes a person",
"DateAdded": "2014-07-17T14:13:00Z",
"Name": "Joe",
"LastName": "Soap",
"Height": "192cm",
"Age": "25"
}
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File":"/store/document/scan_adf123.jpg",
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
How would I find a document based on multiple criteria, say for example "Make"="Ford" and "Color"="Blue". I realize I need a view for this, but I don't know what the key is going to be, and as you can see from the two documents, the key/value pairs aren't consistent. The only consistent item will be the "File" key.
I'm attempting to create couchDB database that will store the location of files, but tagged with Key/Value pairs.
EDIT:
Perhaps I should reconsider my data structure. modify it slightly?
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File": "/store/document/scan_adf123.jpg",
"Tags": {
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
}
So, I need to find by the Key>Value pair in the tag or any number of Key>Value pairs to filter which document I want. The problem here is, I want to tag objects with a key>value pair. These tags could be very different per view, so the next document will have a whole diff set of Key>Value pairs.
Couchdb supports flexible schema. There is no need for the documents to be consistent for them to be query-able. The view for your scenario is pretty straightforward. Here is the map function that should do the trick.
function(doc){
if(doc.Make&&doc.Color)
emit([doc.Make,doc.Color],null);
}
This gives you a view which you can then query like
/view-name/key=["Ford","Blue"]&include_docs=true
This should give you the desired result.
Edit based on comment
For that you will need two separate views. Every view in couchdb is designed to fulfil a specific query need. This means that you have to think about access strategy of your data. It is a lot more work on your part initially but for the trouble you are rewarded with data that is indexed and has very fast access times.
So to answer your question directly. Create two views. One for Make like we have already done and other for Name like
function(doc){
if(doc.Name&&doc.LastName)
emit([doc.Name,doc.Name],null);
}
Now the Name view will index only those documents that have name in it. Where as Make view will index those documents that have make in it.
What happens when a requirement comes in future for which you don't have a query?
You can try a few things.
This is probably the easiest solution. Use couchdb-lucene for your dynamic queries. In this case your architecture will be like couchdb views for queries that you know your application would need. Lucene index for queries that you don't know you might need. So for instance you have indexed name and last name in the in couchdb query. But a requirement arises and you might need to query by age then simply dump the age field in lucene and it will take care of the rest.
Another approach is using the PPP technique where you exploit the fact that creating views is a one time cost and you can create views on less active hours and deploy them in a production service once they are built.
Combine steps 1 and 2! lucene to handle adhoc request while you are building views using the ppp technique.

Resources