Azure Search match against two properties of the same object - azure

I would like to do a query matches against two properties of the same item in a sub-collection.
Example:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "person.1#xpto.org" },
{ "type": "phone", "value": "555-12345" },
]
}
]
I would like to be able to search by emails than contain xpto.org but,
doing something like the following doesn't work:
search.ismatchscoring('email','contacts/type,','full','all') and search.ismatchscoring('/.*xpto.org/','contacts/value,','full','all')
instead, it will consider the condition in the context of the main object and objects like the following will also match:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "555-12345" },
{ "type": "phone", "value": "person.1#xpto.org" },
]
}
]
Is there any way around this without having an additional field that concatenates type and value?

Just saw the official doc. At this moment, there's no support for correlated search:
This happens because each clause applies to all values of its field in
the entire document, so there's no concept of a "current sub-document
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
and https://learn.microsoft.com/en-us/azure/search/search-query-understand-collection-filters

The solution I've implemented was creating different collections per contact type.
This way I'm able to search directly in, lets say, the email collection without the need for correlated search. It might not be the solution for all cases but it works well in this case.

Related

Power Automate FIlter Array with Array Object as Attribute

i have a Object-Array1 with some Attributes that are Object-Array2. I want to filter my Object-Array1 only to these elements, that contain a special value in Object-Array2. How wo i do this? Example:
{
"value": [
{
"title": "aaa",
"ID": 1,
"Responsible": [
{
"EMail": "abc#def.de",
"Id": 1756,
},
{
"EMail: "xyz#xyz.com",
"Id": 289,
}
]
},
{
"title": "bbbb",
"ID": 2,
"Responsible": [
{
"EMail": "tzu#iop.de",
"Id": 1756,
}
]
}
]
}
I want to filter my Object-Array1 (with title & id) only to these elements, that contain abc#def.de
How do i do this in Power Automate with the "Filter Array" Object? I tried this way, but didn't work:
Firstly, you haven't entered an expression, you've entered text. That will never work.
Secondly, even if you did set that as an expression, I don't think you'll be able to make it work over an array, at least, not without specifying more properties and making it a little more complex.
I think the easiest way is to use a contains statement after turning the item into a string ...
The expression I am using on the left hand side is ...
string(item()?['Responsible'])
... and this is the result ...

Filtering Contentful Query on Linked Objects

I'm attempting to utilize Contentful on a current project of mine and I'm trying to understand how to filter my query results based on a field in a linked object.
My top level object contains a Link defined as such:
"name": "Service_Description",
"fields": [
{
"name": "Header",
"id": "header",
"type": "Link",
"linkType": "Entry",
"required": true,
"validations": [
{
"linkContentType": [
"offerGeneral"
]
}
],
"localized": false,
"disabled": false,
"omitted": false
},
This "header" field links to another content type that has this definition:
"fields": [
{
"name": "General",
"id": "general",
"type": "Link",
"linkType": "Entry",
"required": true,
"validations": [
{
"linkContentType": [
"genericGeneral"
]
}
],
"localized": false,
"disabled": false,
"omitted": false
},
which then links to the lowest level:
"fields": [{
"name": "TagList",
"id": "tagList",
"type": "Array",
"items": {
"type": "Link",
"linkType": "Entry",
"validations": [
{
"linkContentType": [
"tag"
]
}
]
},
"validations": []
}
where tagList is an array of tags this piece of content may have.
I want to be able to run a query from the top level object that says get me X number of these "Service_Description" content entries where it contains a tag from a supplied list of tags.
In PostMan, I've been running with this:
https://cdn.contentful.com/spaces/{SPACE_ID}/entries?access_token={ACCESS_TOKEN}&content_type=serviceDescription&include=3
I'm trying to add a filter something like so:
fields.header.fields.general.fields.tagList.sys.id%5Bin%5D={TAG_SYS_ID}
This is clearly incorrect, but I've been struggling with how to walk this relationship to achieve my goal. Perusing the documentation this seems to have something to do with includes, but I'm unsure of how to rectify the problem.
Any direction on how to achieve my goal or if this is possible?
This is now possible, something I believe was solved for in the API based on requests for this functionality. You can see the thread here.
This gist of it is that you have to query on the entries that have linked entries and then include the contentType for those linked entries in the query like so:
contentfulClient.getEntries({
'content_type': 'location',
'fields.market.fields.marketName': 'New York',
'fields.market.sys.contentType.sys.id': 'marketRegion'
})
Unfortunately what you are requesting is not currently possible in Contentful.
We were facing a very similar issue with nested/referenced content types and support said it wasn't possible.
We ended up writing a very complicated system that allowed us to do what you want. Essentially doing a full text search for the referenced content and then querying all of the parents entries. We then matched the relationships by iterating over the parents to find the relationship.
Sorry it couldn't be easier. Hopefully the devs work on something that improve this complication. We have brought this to their attention.

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Optimal way to model documents hierarchy in CouchDB

I'm trying to model document a hierarchy in CouchDB to use in my system, which is conceptually similar to a blog. Each blog post belongs to at least one category and each category can have many posts. Categories are hierarchical, meaning that if a post belongs to CatB in the hierarchy "CatA->CatB" ("CatB is in CatA)", it belongs also to CatA.
Users must be able to quickly find all post in a category (and all its children).
Solution 1
Each document of the post type contains a "category" array representing its position in the hierarchy (see 2).
{
"_id": "8e7a440862347a22f4a1b2ca7f000e83",
"type": "post",
"author": "dexter",
"title": "Hello",
"category":["OO","Programming","C++"]
}
Solution 2
Each document of the post type contains the "category" string representing its path in the hierarchy (see 4).
{
"_id": "8e7a440862347a22f4a1b2ca7f000e83",
"type": "post",
"author": "dexter",
"title": "Hello",
"category": "OO/Programming/C++"
}
Solution 3
Each document of the post type contains its parent "category" id representing its path in the hierarchy (see 3). A hierarchical category structure is built through linked "category" document types.
{
"_id": "8e7a440862347a22f4a1b2ca7f000e83",
"type": "post",
"author": "dexter",
"title": "Hello",
"category_id": "3"
}
{
"_id": "1",
"type": "category",
"name": "OO"
}
{
"_id": "2",
"type": "category",
"name": "Programming",
"parent": "1"
}
{
"_id": "3",
"type": "category",
"name": "C++",
"parent": "2"
}
Question
What's the best way to store this kind of relationship in CouchDB? What's the most efficient solution in terms of disk space, scalability and retrieval speed?
Can such a relation be modelled to take into account localised category names?
Disclaimer
I know this question has been asked a few times already here on SO, but it seems there's no definitive answer to it nor an answer which deals with the pros and cons of each solution. Sorry for the length of the question :)
Read so far
CouchDB - The Definitive Guide
Storing Hierarchical Data in CouchDB
Retrieving Hierarchical/Nested Data From CouchDB
Using CouchDB group_level for hierarchical data
There's no right answer to this question, hence the lack of a definitive answer. It mostly depends on what kind of usage you want to optimize for.
You state that retrieval speed of documents that belong to a certain category (and their children) is most important. The first two solutions allow you to create a view that emits a blog post multiple times, once for each category in the chain from the leaf to the root. Thus selecting all documents can be done using a single (and thus fast) query. The only difference of second solution to first solution is that you move the parsing of the category "path" into components from the code that inserts the document to the map function of the view. I would prefer the first solution as it's simpler to implement the map function and a bit more flexible (e.g. it allows a category's name to contain a slash character).
In your scenario you probably also want to create a reduced view which counts the number of blog posts for each category. This is very simple with either of these solutions. With a fitting reduction function, the number of post in every category can be retrieved using a single request.
A downside of the first two solutions is that renaming or moving a category from one parent to another requires every document to be updated. The third solution allows that without touching the documents. But from the description of your scenario I assume that retrieval by category is very frequent and category renaming/moving is very rare.
Solution 4 I propose a fourth solution where blog post documents hold references to category documents but still reference all the ancestors of the post's category. This allows categories to be renamed without touching the blog posts and allows you to store additional metadata with a category (e.g. translations of the category name or a description):
{
"_id": "8e7a440862347a22f4a1b2ca7f000e83",
"type": "post",
"author": "dexter",
"title": "Hello",
"category_ids": [3, 2, 1]
}
{
"_id": "1",
"type": "category",
"name": "OO"
}
{
"_id": "2",
"type": "category",
"name": "Programming",
"parent": "1"
}
{
"_id": "3",
"type": "category",
"name": "C++",
"parent": "2"
}
You will still have to store the parents of categories with the categories, which is duplicating data in the posts, to allow categories to be traversed (e.g. for displaying a tree of categories for navigation).
You can extend this solution or any of your solutions to allow a post to be categorized under multiple categories, or a category to have multiple parents. When a post is categorized in multiple categories, you will need to store the union of the ancestors of each category in the post's document while preserving the categories selected by the author to allow them to be displayed with the post or edited later.
Lets assume that there is an additional category named "Ajax" with anchestors "JavaScript", "Programming" and "OO". To simplify the following example, I've chosen the document IDs of the categories to equal the category's name.
{
"_id": "8e7a440862347a22f4a1b2ca7f000e83",
"type": "post",
"author": "dexter",
"title": "Hello",
"category_ids": ["C++", "Ajax"],
"category_anchestor_ids": ["C++", "Programming", "OO", "Ajax", "JavaScript"]
}
To allow a category to have multiple parents, just store multiple parent IDs with a category. You will need to eliminate duplicates while finding all the ancestors of a category.
View for Solution 4 Suppose you want to get all the blog posts for a specific category. We will use a database with the following sample data:
{ "_id": "100", "type": "category", "name": "OO" }
{ "_id": "101", "type": "category", "name": "Programming", "parent_id": "100" }
{ "_id": "102", "type": "category", "name": "C++", "parent_id": "101" }
{ "_id": "103", "type": "category", "name": "JavaScript", "parent_id": "101" }
{ "_id": "104", "type": "category", "name": "AJAX", "parent_id": "103" }
{ "_id": "200", "type": "post", "title": "OO Post", "category_id": "104", "category_anchestor_ids": ["100"] }
{ "_id": "201", "type": "post", "title": "Programming Post", "category_id": "101", "category_anchestor_ids": ["101", "100"] }
{ "_id": "202", "type": "post", "title": "C++ Post", "category_id": "102", "category_anchestor_ids": ["102", "101", "100"] }
{ "_id": "203", "type": "post", "title": "AJAX Post", "category_id": "104", "category_anchestor_ids": ["104", "103", "101", "100"] }
In addition to that, we use a view called posts_by_category in a design document called _design/blog with the the following map function:
function (doc) {
if (doc.type == 'post') {
for (i in doc.category_anchestor_ids) {
emit([doc.category_anchestor_ids[i]], doc)
}
}
}
Then we can get all the posts in the Programming category (which has ID "101") or one of it's subcategories using a GET requests to the following URL.
http://localhost:5984/so/_design/blog/_view/posts_by_category?reduce=false&key=["101"]
This will return a view result with the keys set to the category ID and the values set to the post documents. The same view can also be used to get a summary list of all categories and the number of post in that category and it's children. We add the following reduce function to the view:
function (keys, values, rereduce) {
if (rereduce) {
return sum(values)
} else {
return values.length
}
}
And then we use the following URL:
http://localhost:5984/so/_design/blog/_view/posts_by_category?group_level=1
This will return a reduced view result with the keys again set to the category ID and the values set to the number of posts in each category. In this example, the categories name's would have to be fetched separately but it is possible to create view where each row in the reduced view result already contains the category name.

How do you partially-match IDs in CouchDB?

I have a set of ACLs in Couch and I want to create a view that matches applicable ones. So, given the data:
[
{
"_id": "/protected",
"type": "valid-user"
},
{
"_id": "/protected/group1",
"type": "require group group1"
},
{
"_id": "/protected/group1/public",
"type": "public"
},
{
"_id": "/protected/group2",
"type": "require group group2"
},
{
"_id": "/admin",
"type": "require user admin"
}
]
I'd like to create a view that'd allow me to pass in a string and have it find the "best" (that is to say the longest) match.
The best I've been able to do is to create a view that returns the ID split into an array and then spam queries trimming the last element off until I get a match. Surely there's a way to do this on the server side ...
You could create a list function to accomplish that.

Resources