CouchDB searching linked documents - couchdb

I'm very new to couchdb and i'm hoping someone can help me with a solution to this problem.
Say I have an address document that contains various keys, but importantly a singleLineAddress and a persons array:
{
"_id": "002cb726bfe69a79ed9b897931000ec6",
"_rev": "2-6af6d8896703e9db6f5ba97abb1ca5d7",
"type": "address",
...
"singleLineAddress": "28 CLEVEDON ROAD, WESTON-SUPER-MARE, BS23 1DG",
...
"persons":["d506d09a1c46e32f6632e6d99a0062bd","002cb726bfe69a79ed9b897931001c80"]
}
Then i have a person document with a number of keys, crucially with firstName & lastName:
{
"_id": "d506d09a1c46e32f6632e6d99a0062bd",
"_rev": "4-98fae966a92d5c6c359cb8ddfaa487e1",
"type": "person",
...
"firstName": "Joe",
"lastName": "Bloggs"
...
}
I understand I can created a linked document view and emit all the person id's linked to address, then I can use include_docs=true to see all the person data. But, from what i'm reading it's not advised to use include_docs=true as it can be expensive.
Ultimately, i'd like to use couchdb-lucene to run a FTS against person # address using the name & address. Is that even possible using linked documents?

Using ?include_docs=true is more expensive than not using it - for every row of the index returned, the database has to fetch the related document body. But sometimes needs must :) You can avoid using ?include_docs=true by "projecting" more data into the index which is returned to you at query time. See https://blog.cloudant.com/2021/11/12/Projection.html
As for Lucene full-text searching, you can certainly search across document types in the same collection but your search results would consist of a mixture of address and people documents - full text searching can't do the "join" between an address and its occupants - you'd have to do that yourself later.
If you desperately need to return address and people objects together, then consider combining the two: your address document would contain an array of people objects that reside there? There is a trade off between combining objects such that data the belongs together is stored together, and keeping every micro object separate for ease of updating.

Related

How to update fields by query in Azure search?

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.
I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

Elasticsearch js with Node.js: How to return aggregated results from multiple indexes?

We have two indexes: posts and users. We'd like to make queries on these two indexes, search for a post in the index "posts" and then go to the index "users" to get the user info, to eventually return an aggregated result of both the user info and the post we found.
Let me clarify it a bit with an example:
posts:
[
{
post: "this is a post about stack overflow",
username: "james_bond",
user_id: "007"
},
{...}
]
users:
[
{
username: "james_bond",
user_id: "007",
bio: "My name's James. James Bond."
nb_posts: "7"
},
{...}
]
I want to search for all the posts which contain "stack overflow", and then display all the users who are talking about it and their info (from the "users" index), it could look something like this:
result: {
username: "james_bond",
user_id: "007",
post: "this is a post about stack overflow",
bio: "My name's James. James Bond"
}
I hope this is clear enough, I'm sorry if this question has already been answered but I honestly didn't find any answer anywhere.
So is it possible to do so with only ES js?
I dont beleive it is possible to do exactly what you are asking as it would be very costly to join across two indexes which are potentially sharded across different nodes (this is not a main use case for elasticsearch). But if you have control of the data within elastic search you could structure the data so that you can acheive a different type of joining.
You can either use:
nested query
Where documents may contain fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
has_child and has_parent queries
A join field relationship can exist between documents within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.
Denormalisation
Alternativly you could store the user denormalised within the post document when you insert the document into the index. This becomes a balancing act between saving time from doing multiple reads every time a post is viwed (fully normalised) and the cost of updating all posts from user 007 everytime his detials change (denormalised). There is a tradeoff here, you dont need to denormalise everything and as you have it you have already denormalised the username from users to posts.
Here is a Question/Answer that gives more detials on the options.

How to build search with facetting over unknown/unspecified set of attributes/properties?

I'm working on a product search engine with a big set of undefined products which is constantly growing. Each product has different attributes and at this time they're saved in an array of string key-value pairs like this:
"attributes": [
{
"key": "Producttype",
"value": "Headphones - 3.5 mm plug"
},
{
"key": "Weight",
"value": "280 g"
},
{
"key": "Soundmode",
"value": "Stereo"
},
....
]
Each product has also a category. I'm using elasticsearch 2.4.x to persist data that i want to search on via spring-data-elasticsearch. It's possible to upgrade to the newest elasticsearch version if needed.
As you can see the attributes are really generic. It's also needed to use nested objects to be able to search on this attributes. I'm also thinking about preprocessing this attributes to a standardized format. For example the "Weight" key might be written in different forms like "Productweight" or "Weight of product". Because there are a lot of attributes and i wouldn't like to create a custom property/field for each one i thought about about mapping only the important ones (like weight) to a custom, own field and to map the other attributes like described above.
Now if someone searches for example "iphone" i would like to show some facettes on the left of the search result page. The facettes should differ if someone searches "Adidas shoes". Is this possible with the given format above using nested objects? Is it possible to build the facettes dynamically regarding to the resultset elasticsearch is returning? E.g. the most common properties which all result products contain should be used to create facettes. Or do i have to persist some predefined filters/facettes on each category? I think that would be too much work and also doesn't work on search results where products can have different categories. What's the best practice to build a search feature with facetting on entities with n different properties that can grow in future?

Designing a relationship between documents in Couchbase/CouchDB?

Learning how to build a relationship between documents in Couchbase. Maybe this question asked better with example.
Let say there are hotels and guests. Many hotels and many guests.
{
"_id": "hotel1"
"type": "hotel"
"name": "Hilton"
...
}
{
"_id": "hotel2"
"type": "hotel"
"name": "Hampton"
...
}
{
"_id" : "guest1"
"type": "guest"
"name": "John"
...
}
{
"_id" : "guest2"
"type": "guest"
"name": "Erin"
...
}
One way to build a relationship is to embed guest IDs within hotel documents but this is going to get very big over time.
The other way is that to embed hotel IDs within guest documents and create views for each hotel to list its guests. Since hotels are added over time, these views need to dynamically added whenever hotel document is created. If there are 500 hotels, they will be 500 views.
What is the cleaner way to build relationship for such data and retrieve guest data for a hotel?
Am assuming Couchbase views work the same way as CouchDB views.
Put the hotel id in the guest doc. Have just one view that emits a complex key for each hotel visited by the guest. Eg
for (var visit in guest.hotels) {
emit([visit.hotel_id, visit.date], guest_id);
}
You can then get a hotel's guest list by querying the view with
startkey = ['hotel1'] and endkey = ['hotel1', {}]
You can also get guests list between specific dates thus:
startkey = ['hotel1', date1] and endkey = ['hotel1', date2]
I like the other answer posted here; however, I don't believe there is enough explanation in it for this to be educational. If I understand your question correctly, there is a bit of a flaw in the design that the question is trying to overcome.
You importantly point out that it is not feasible to embed a list of guests within each Hotel document. Likewise, it is cumbersome, but not quite as unrealistic, to embed a list of hotels within a guest document.
What I am missing here is the fact that there are three entities needed to have a hotel stay:
Hotel
Guest
Reservation
The Reservation entity should contain both the Guest, the Hotel, and any relevant details specific to the particular stay. You can't put this information into either the Hotel or Guest documents, because you will eventually have objects that are too big to perform well (though admittedly it will take awhile). When in doubt, always err on the side of good performance.
Like the other answer points out, it is then quite trivial to create views that pull Guests by Hotel and Hotels by Guest.
I think in your case you can create a relationship map document:
{
"hotel_id": "hotel1",
"guest_id": "guest1"
}
So it will store hotel/guest ids. Then you can have 2 universal views created: "ListHotelGuests" and "ListGuestHotels". You can filter, group, etc these views as per you application business logic.

couchdb match multiple inconsistent keys

Considering the following two documents:
{
"_id": "a6b8d3d7e2d61c97f4285220c103abca",
"_rev": "7-ad8c3eaaab2d4abfa01abe36a74da171",
"File":"/store/document/scan_bgd123.jpg",
"Commend": "Describes a person",
"DateAdded": "2014-07-17T14:13:00Z",
"Name": "Joe",
"LastName": "Soap",
"Height": "192cm",
"Age": "25"
}
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File":"/store/document/scan_adf123.jpg",
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
How would I find a document based on multiple criteria, say for example "Make"="Ford" and "Color"="Blue". I realize I need a view for this, but I don't know what the key is going to be, and as you can see from the two documents, the key/value pairs aren't consistent. The only consistent item will be the "File" key.
I'm attempting to create couchDB database that will store the location of files, but tagged with Key/Value pairs.
EDIT:
Perhaps I should reconsider my data structure. modify it slightly?
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File": "/store/document/scan_adf123.jpg",
"Tags": {
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
}
So, I need to find by the Key>Value pair in the tag or any number of Key>Value pairs to filter which document I want. The problem here is, I want to tag objects with a key>value pair. These tags could be very different per view, so the next document will have a whole diff set of Key>Value pairs.
Couchdb supports flexible schema. There is no need for the documents to be consistent for them to be query-able. The view for your scenario is pretty straightforward. Here is the map function that should do the trick.
function(doc){
if(doc.Make&&doc.Color)
emit([doc.Make,doc.Color],null);
}
This gives you a view which you can then query like
/view-name/key=["Ford","Blue"]&include_docs=true
This should give you the desired result.
Edit based on comment
For that you will need two separate views. Every view in couchdb is designed to fulfil a specific query need. This means that you have to think about access strategy of your data. It is a lot more work on your part initially but for the trouble you are rewarded with data that is indexed and has very fast access times.
So to answer your question directly. Create two views. One for Make like we have already done and other for Name like
function(doc){
if(doc.Name&&doc.LastName)
emit([doc.Name,doc.Name],null);
}
Now the Name view will index only those documents that have name in it. Where as Make view will index those documents that have make in it.
What happens when a requirement comes in future for which you don't have a query?
You can try a few things.
This is probably the easiest solution. Use couchdb-lucene for your dynamic queries. In this case your architecture will be like couchdb views for queries that you know your application would need. Lucene index for queries that you don't know you might need. So for instance you have indexed name and last name in the in couchdb query. But a requirement arises and you might need to query by age then simply dump the age field in lucene and it will take care of the rest.
Another approach is using the PPP technique where you exploit the fact that creating views is a one time cost and you can create views on less active hours and deploy them in a production service once they are built.
Combine steps 1 and 2! lucene to handle adhoc request while you are building views using the ppp technique.

Resources