How to update fields by query in Azure search? - azure

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.

I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

Related

CouchDB searching linked documents

I'm very new to couchdb and i'm hoping someone can help me with a solution to this problem.
Say I have an address document that contains various keys, but importantly a singleLineAddress and a persons array:
{
"_id": "002cb726bfe69a79ed9b897931000ec6",
"_rev": "2-6af6d8896703e9db6f5ba97abb1ca5d7",
"type": "address",
...
"singleLineAddress": "28 CLEVEDON ROAD, WESTON-SUPER-MARE, BS23 1DG",
...
"persons":["d506d09a1c46e32f6632e6d99a0062bd","002cb726bfe69a79ed9b897931001c80"]
}
Then i have a person document with a number of keys, crucially with firstName & lastName:
{
"_id": "d506d09a1c46e32f6632e6d99a0062bd",
"_rev": "4-98fae966a92d5c6c359cb8ddfaa487e1",
"type": "person",
...
"firstName": "Joe",
"lastName": "Bloggs"
...
}
I understand I can created a linked document view and emit all the person id's linked to address, then I can use include_docs=true to see all the person data. But, from what i'm reading it's not advised to use include_docs=true as it can be expensive.
Ultimately, i'd like to use couchdb-lucene to run a FTS against person # address using the name & address. Is that even possible using linked documents?
Using ?include_docs=true is more expensive than not using it - for every row of the index returned, the database has to fetch the related document body. But sometimes needs must :) You can avoid using ?include_docs=true by "projecting" more data into the index which is returned to you at query time. See https://blog.cloudant.com/2021/11/12/Projection.html
As for Lucene full-text searching, you can certainly search across document types in the same collection but your search results would consist of a mixture of address and people documents - full text searching can't do the "join" between an address and its occupants - you'd have to do that yourself later.
If you desperately need to return address and people objects together, then consider combining the two: your address document would contain an array of people objects that reside there? There is a trade off between combining objects such that data the belongs together is stored together, and keeping every micro object separate for ease of updating.

Suggestions for my data structure/schema with Pouchdb - Couchdb

Good morning!
I want to use couchdb/pouchdb for my pwa that I currently work on.
In my project I want to to store "Projects", in a "Project" I want to store the project-title and "Chapters", in a "Chapter" I want to store the chapter-title and "Scenes", a "Scene" contains contain text.
What schema would make the most sense and performance?
Right now I think about a plan like this:
Project 1
title: string
Chapter 1
Scene 1
text: string
Scene 2
text: string
Scene 3
text: string
Chapter 2
...
Project 2
title: string
Chapter 1
Scene 1
...
Since I only have SQL experience and never used document based databases before, I dont really know how to put a structure that makes sense.
Do I store documents inside documents to have a schema that looks exactly like above or do I create a database for each component(Projects,Chapters,Scenes)?
You have several options.
Each project is a document with a list of chapters, each with a list of scenes.
Projects, chapters and scenes are three different kinds of documents in the same database.
Which one is the best depends on the likely total size, and how each of these components change. CouchDB works best with small documents (kilobytes). As you can only update whole documents, changing bits inside lists or inside objects in larger documents quickly become inefficient, and potentially generating update conflicts.
The second suggestion above will scale better, but (currently; see link below) lacks the convenience of being able to pull out everything about a project with a single API call. You can use the id field to great effect:
{
"_id": "project1:toplevel",
"type": "project",
"title": "Project 1"
}
{
"_id": "project1:chapter1",
"type": "chapter",
"title": "Project 1, chapter 1"
}
{
"_id": "project1:chapter1#scene1",
"type": "scene",
"title": "Project 1, chapter 1, scene 1"
}
In a "landing soon" version of CouchDB this id format can be used to leverage so-called partitioned databases that would be a great fit here. You can read blog posts about it here:
https://blog.cloudant.com/2019/03/05/Partition-Databases-Introduction.html

How to build search with facetting over unknown/unspecified set of attributes/properties?

I'm working on a product search engine with a big set of undefined products which is constantly growing. Each product has different attributes and at this time they're saved in an array of string key-value pairs like this:
"attributes": [
{
"key": "Producttype",
"value": "Headphones - 3.5 mm plug"
},
{
"key": "Weight",
"value": "280 g"
},
{
"key": "Soundmode",
"value": "Stereo"
},
....
]
Each product has also a category. I'm using elasticsearch 2.4.x to persist data that i want to search on via spring-data-elasticsearch. It's possible to upgrade to the newest elasticsearch version if needed.
As you can see the attributes are really generic. It's also needed to use nested objects to be able to search on this attributes. I'm also thinking about preprocessing this attributes to a standardized format. For example the "Weight" key might be written in different forms like "Productweight" or "Weight of product". Because there are a lot of attributes and i wouldn't like to create a custom property/field for each one i thought about about mapping only the important ones (like weight) to a custom, own field and to map the other attributes like described above.
Now if someone searches for example "iphone" i would like to show some facettes on the left of the search result page. The facettes should differ if someone searches "Adidas shoes". Is this possible with the given format above using nested objects? Is it possible to build the facettes dynamically regarding to the resultset elasticsearch is returning? E.g. the most common properties which all result products contain should be used to create facettes. Or do i have to persist some predefined filters/facettes on each category? I think that would be too much work and also doesn't work on search results where products can have different categories. What's the best practice to build a search feature with facetting on entities with n different properties that can grow in future?

Designing a relationship between documents in Couchbase/CouchDB?

Learning how to build a relationship between documents in Couchbase. Maybe this question asked better with example.
Let say there are hotels and guests. Many hotels and many guests.
{
"_id": "hotel1"
"type": "hotel"
"name": "Hilton"
...
}
{
"_id": "hotel2"
"type": "hotel"
"name": "Hampton"
...
}
{
"_id" : "guest1"
"type": "guest"
"name": "John"
...
}
{
"_id" : "guest2"
"type": "guest"
"name": "Erin"
...
}
One way to build a relationship is to embed guest IDs within hotel documents but this is going to get very big over time.
The other way is that to embed hotel IDs within guest documents and create views for each hotel to list its guests. Since hotels are added over time, these views need to dynamically added whenever hotel document is created. If there are 500 hotels, they will be 500 views.
What is the cleaner way to build relationship for such data and retrieve guest data for a hotel?
Am assuming Couchbase views work the same way as CouchDB views.
Put the hotel id in the guest doc. Have just one view that emits a complex key for each hotel visited by the guest. Eg
for (var visit in guest.hotels) {
emit([visit.hotel_id, visit.date], guest_id);
}
You can then get a hotel's guest list by querying the view with
startkey = ['hotel1'] and endkey = ['hotel1', {}]
You can also get guests list between specific dates thus:
startkey = ['hotel1', date1] and endkey = ['hotel1', date2]
I like the other answer posted here; however, I don't believe there is enough explanation in it for this to be educational. If I understand your question correctly, there is a bit of a flaw in the design that the question is trying to overcome.
You importantly point out that it is not feasible to embed a list of guests within each Hotel document. Likewise, it is cumbersome, but not quite as unrealistic, to embed a list of hotels within a guest document.
What I am missing here is the fact that there are three entities needed to have a hotel stay:
Hotel
Guest
Reservation
The Reservation entity should contain both the Guest, the Hotel, and any relevant details specific to the particular stay. You can't put this information into either the Hotel or Guest documents, because you will eventually have objects that are too big to perform well (though admittedly it will take awhile). When in doubt, always err on the side of good performance.
Like the other answer points out, it is then quite trivial to create views that pull Guests by Hotel and Hotels by Guest.
I think in your case you can create a relationship map document:
{
"hotel_id": "hotel1",
"guest_id": "guest1"
}
So it will store hotel/guest ids. Then you can have 2 universal views created: "ListHotelGuests" and "ListGuestHotels". You can filter, group, etc these views as per you application business logic.

couchdb match multiple inconsistent keys

Considering the following two documents:
{
"_id": "a6b8d3d7e2d61c97f4285220c103abca",
"_rev": "7-ad8c3eaaab2d4abfa01abe36a74da171",
"File":"/store/document/scan_bgd123.jpg",
"Commend": "Describes a person",
"DateAdded": "2014-07-17T14:13:00Z",
"Name": "Joe",
"LastName": "Soap",
"Height": "192cm",
"Age": "25"
}
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File":"/store/document/scan_adf123.jpg",
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
How would I find a document based on multiple criteria, say for example "Make"="Ford" and "Color"="Blue". I realize I need a view for this, but I don't know what the key is going to be, and as you can see from the two documents, the key/value pairs aren't consistent. The only consistent item will be the "File" key.
I'm attempting to create couchDB database that will store the location of files, but tagged with Key/Value pairs.
EDIT:
Perhaps I should reconsider my data structure. modify it slightly?
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File": "/store/document/scan_adf123.jpg",
"Tags": {
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
}
So, I need to find by the Key>Value pair in the tag or any number of Key>Value pairs to filter which document I want. The problem here is, I want to tag objects with a key>value pair. These tags could be very different per view, so the next document will have a whole diff set of Key>Value pairs.
Couchdb supports flexible schema. There is no need for the documents to be consistent for them to be query-able. The view for your scenario is pretty straightforward. Here is the map function that should do the trick.
function(doc){
if(doc.Make&&doc.Color)
emit([doc.Make,doc.Color],null);
}
This gives you a view which you can then query like
/view-name/key=["Ford","Blue"]&include_docs=true
This should give you the desired result.
Edit based on comment
For that you will need two separate views. Every view in couchdb is designed to fulfil a specific query need. This means that you have to think about access strategy of your data. It is a lot more work on your part initially but for the trouble you are rewarded with data that is indexed and has very fast access times.
So to answer your question directly. Create two views. One for Make like we have already done and other for Name like
function(doc){
if(doc.Name&&doc.LastName)
emit([doc.Name,doc.Name],null);
}
Now the Name view will index only those documents that have name in it. Where as Make view will index those documents that have make in it.
What happens when a requirement comes in future for which you don't have a query?
You can try a few things.
This is probably the easiest solution. Use couchdb-lucene for your dynamic queries. In this case your architecture will be like couchdb views for queries that you know your application would need. Lucene index for queries that you don't know you might need. So for instance you have indexed name and last name in the in couchdb query. But a requirement arises and you might need to query by age then simply dump the age field in lucene and it will take care of the rest.
Another approach is using the PPP technique where you exploit the fact that creating views is a one time cost and you can create views on less active hours and deploy them in a production service once they are built.
Combine steps 1 and 2! lucene to handle adhoc request while you are building views using the ppp technique.

Resources