Designing a relationship between documents in Couchbase/CouchDB? - couchdb

Learning how to build a relationship between documents in Couchbase. Maybe this question asked better with example.
Let say there are hotels and guests. Many hotels and many guests.
{
"_id": "hotel1"
"type": "hotel"
"name": "Hilton"
...
}
{
"_id": "hotel2"
"type": "hotel"
"name": "Hampton"
...
}
{
"_id" : "guest1"
"type": "guest"
"name": "John"
...
}
{
"_id" : "guest2"
"type": "guest"
"name": "Erin"
...
}
One way to build a relationship is to embed guest IDs within hotel documents but this is going to get very big over time.
The other way is that to embed hotel IDs within guest documents and create views for each hotel to list its guests. Since hotels are added over time, these views need to dynamically added whenever hotel document is created. If there are 500 hotels, they will be 500 views.
What is the cleaner way to build relationship for such data and retrieve guest data for a hotel?

Am assuming Couchbase views work the same way as CouchDB views.
Put the hotel id in the guest doc. Have just one view that emits a complex key for each hotel visited by the guest. Eg
for (var visit in guest.hotels) {
emit([visit.hotel_id, visit.date], guest_id);
}
You can then get a hotel's guest list by querying the view with
startkey = ['hotel1'] and endkey = ['hotel1', {}]
You can also get guests list between specific dates thus:
startkey = ['hotel1', date1] and endkey = ['hotel1', date2]

I like the other answer posted here; however, I don't believe there is enough explanation in it for this to be educational. If I understand your question correctly, there is a bit of a flaw in the design that the question is trying to overcome.
You importantly point out that it is not feasible to embed a list of guests within each Hotel document. Likewise, it is cumbersome, but not quite as unrealistic, to embed a list of hotels within a guest document.
What I am missing here is the fact that there are three entities needed to have a hotel stay:
Hotel
Guest
Reservation
The Reservation entity should contain both the Guest, the Hotel, and any relevant details specific to the particular stay. You can't put this information into either the Hotel or Guest documents, because you will eventually have objects that are too big to perform well (though admittedly it will take awhile). When in doubt, always err on the side of good performance.
Like the other answer points out, it is then quite trivial to create views that pull Guests by Hotel and Hotels by Guest.

I think in your case you can create a relationship map document:
{
"hotel_id": "hotel1",
"guest_id": "guest1"
}
So it will store hotel/guest ids. Then you can have 2 universal views created: "ListHotelGuests" and "ListGuestHotels". You can filter, group, etc these views as per you application business logic.

Related

Can you query the DB inside a validate_doc_update function?

I am using validate_doc_update functions to do basic validation on the object to be stored. This is great to ensure that certain fields are present for example. But is there a way to do a validation based on a query from within the validate_doc_update function? For example, I want people to be able to sign up to bring items to a potluck. So each object will have fields for name, phone, and food (e.g. soda, salad, chips). So my validation function will check for each of those fields. No problem. But I also want to make sure that no more than two people sign up for the same food (not just a basic unique constraint). So if a new object with food value "chips" is being validated, and there are already 2 objects with food value "chips" in the DB, the validation should fail. Is there a way to do this with validation docs?
There is no facility to run a query in validate_doc_update.
One way to solve this issue is to decouple food items from user documents; instead have a document that represents the potluck:
{
_id: "potluck",
chips: {
needed: 2,
providers: ["user_id_1"]
},
soda: {
needed: 5,
providers: ["user_id_2","user_id_3"]
}
}
Here it is quite easy to validate sign ups of items. This document exudes a lot of information e.g. the number of items needed for any item is always needed - providers.length. User id's link food items users have signed up to provide.
It would be easy to generate a potluck report using a view or two with this approach.

CouchDB searching linked documents

I'm very new to couchdb and i'm hoping someone can help me with a solution to this problem.
Say I have an address document that contains various keys, but importantly a singleLineAddress and a persons array:
{
"_id": "002cb726bfe69a79ed9b897931000ec6",
"_rev": "2-6af6d8896703e9db6f5ba97abb1ca5d7",
"type": "address",
...
"singleLineAddress": "28 CLEVEDON ROAD, WESTON-SUPER-MARE, BS23 1DG",
...
"persons":["d506d09a1c46e32f6632e6d99a0062bd","002cb726bfe69a79ed9b897931001c80"]
}
Then i have a person document with a number of keys, crucially with firstName & lastName:
{
"_id": "d506d09a1c46e32f6632e6d99a0062bd",
"_rev": "4-98fae966a92d5c6c359cb8ddfaa487e1",
"type": "person",
...
"firstName": "Joe",
"lastName": "Bloggs"
...
}
I understand I can created a linked document view and emit all the person id's linked to address, then I can use include_docs=true to see all the person data. But, from what i'm reading it's not advised to use include_docs=true as it can be expensive.
Ultimately, i'd like to use couchdb-lucene to run a FTS against person # address using the name & address. Is that even possible using linked documents?
Using ?include_docs=true is more expensive than not using it - for every row of the index returned, the database has to fetch the related document body. But sometimes needs must :) You can avoid using ?include_docs=true by "projecting" more data into the index which is returned to you at query time. See https://blog.cloudant.com/2021/11/12/Projection.html
As for Lucene full-text searching, you can certainly search across document types in the same collection but your search results would consist of a mixture of address and people documents - full text searching can't do the "join" between an address and its occupants - you'd have to do that yourself later.
If you desperately need to return address and people objects together, then consider combining the two: your address document would contain an array of people objects that reside there? There is a trade off between combining objects such that data the belongs together is stored together, and keeping every micro object separate for ease of updating.

How to update fields by query in Azure search?

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.
I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

Cloudant/Couchdb Architecture

I'm building an address-book app that uses a back-end Cloudant database. The database stores 3 types of documents:
-> User Profile document
-> Group document
-> User-to-Group Link document
As the names of the document go, there are users in my database, there are groups for users(like whatsapp), and there are link documents for each user to a group (the link document also stores settings/privileges of that user in that group).
My client-side app on login, queries cloudant for the user document, and each group document using view collation over the link documents of that user.
Then using the groups that I have identified above, I find all the other users of that group.
Now, the challenge is that I need to monitor any changes on the group and user documents. I am using pouchdb on the app side, and can invoke the 'changes' API against the ids of all the group and user documents. But the scale of this can be maybe 500 users in each group, and a logged in user being part of 10-50 groups. That multiplied to 1000s of users will become a nightmare for the back-end to support.
Is my scalability concern warranted? Or is this normal for cloudant?
If I understand your schema correctly, you documents of this form:
{
_id: "user:glynn",
type: "user",
name: "Glynn Bird"
}
{
_id: "group:Developers",
type: "group",
name: "Software Developers"
}
{
_id: "user:glynn:developers"
}
In the above example, the primary key's sorting allows a user and all of its memberships to be retrieved by using startkey and endkey parameters do the database's _all_docs endpoint.
This is "scalable" in the sense that if is efficient for Cloudant retrieve data from a primary or secondary index because the index is held in a b-tree so data with adjacent keys is store next to each other. A limit parameter can be used to paginate through larger data sets.
yes the documents are more or less how you've specified.
Link documents are as follows:
{
"_id": <AutoGeneratedID>,
"type": "link",
"user": user_id,
"group": group_id
}
I've written the following view map function:
if(type == "link") {
emit(doc.user, {"_id": doc.user});
emit([doc.user, doc.group], {"_id": doc.group});
emit([doc.group, doc.user], {"_id": doc.user});
}
using the above 3 indexes and include-docs=true, 1st lets me get my logged-in user document, 2nd lets me get all group documents for my logged-in user (using start and end key), and 3rd lets me get all other user documents for a group (using start and end key again).
Fetching the documents is done, but now I need to monitor changes on users of each group, for this, don't I need to query the changes API with array of user ids ? Is there any other way ?
Cloudant retrieve data from a primary or secondary index because the
index is held in a b-tree so data with adjacent keys is store next to
each other
Sorry, I did not understand this statement ?
Thanks.
Part 1.
I recommend to get rid of the "link" type here - it's good for SQL world, but not for CouchDb.
Instead of this, it is better to utilize a benefit of Document Storage, i.e. store user groups in property "Groups" for "User"; and property "Users" for "Group".
With this approach you can set up filtered replication to process only changes of specific groups and these changes will already contain all the users of the group.
I want to notice, that I made an assumption, that number of groups for a user and number of groups is reasonable (hundreds at maximum) and doesn't change frequently.
Part 2.
You can just store ids in these properties and then use Views to "join" other data. Or I was also thinking about other approach (for my use case, but yours is similar):
1) Group contains only ids of users - no views needed.
2) You create a view of each user contacts, i.e. for each user get all users with whom he has mutual groups.
3) Replicate this view to client app.
When user opens a group, values (such as names and pics of contacts are taken from this local "dictionary").
This approach can save some traffic.
Please, let me know what do you think. Because right now I'm working on designing architecture of my solution. Thank you!)

Can I create multiple collections per database?

Switching from mongo to pouchdb (with Cloudant), i like the "one database per user" concept, but is there a way to create multiple collections/tables per database ?
Example
- Peter
- History
- Settings
- Friends
- John
- History
- Settings
- Friends
etc...
Couchdb does not have the concept of collections. However, you can achieve similar results using type identifiers on your documents in conjunction with Couchdb views.
Type Identifiers
When you save a document in Couchdb add a field that specifies the type. For example, you would store a friend like so:
{
_id: "XXXX",
type: "Friend",
first_name: "John",
...
}
And you would store history like this:
{
_id: "XXXX",
type: "History",
url: "http://www.google.com",
...
}
Both of these documents would be in the same database, and if you queried all documents on that database then you would receive both.
Views
You can create views that filter on type and then query those views directly. For example, create a view to retrieve friends like so (in Cloudant you can go to add new Design Document and you can copy and paste this directly):
{
"_id" : "_design/friends",
"views" : {
"all" : {
"map" : "function(doc){ if (doc.type && doc.type == 'Friend') { emit(doc._id, doc._rev)}}"
}
}
}
Let's expand the map function:
function(doc) {
if (doc.type && doc.type == "Friend") {
emit(doc._id, doc._rev);
}
}
Essentially this map function is saying to only associate documents to this view that have type == "Friend". Now, we can query this view and only friends will be returned:
http://SERVER/DATABASE/_design/friends/_view/all
Where friends = name of the design document and all = name of the view. Replace SERVER with your server and DATABASE with your database name.
You can find more information about views here:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
You could look into relational-pouch for something like this. Else you could do "3 databases per user." ;)
I may not fully understand what you need here but in general you can achieve what you describe in 3 different ways in CouchDB/Cloudant/PouchDB.
Single document per person (Peter, John). Sure - if the collections are not enormous and more importantly if they are not updated by different users concurrently (or worse in different database instances) leading to conflicts then, in JSON just an element for each collection, holding an array and you can manipulate everything with just one document. Makes access a breeze.
Single document per collection (Peter History, Peter Settings ect). Similar constraints, but you could create a document to hold each of these collections. Provided they will not be concurrently modified often, you would then have a document for Peter's History, and another for Peter's Settings.
Single document per item. This is the finest grain approach - lots of small simple documents each containing one element (say a single History entry for Peter). The code gets slightly simpler because removing items becomes a delete and many clients can update items simultaneously, but now you depend on Views to bring all the items into a list. A view with keys [person, listName, item] for example would let you access what you want.
Generally your data schema decisions come down to concurrency. You mention PouchDB so it may be that you have a single threaded client and option 1 is nice and easy?

Resources