Azure Search, mapping, merge collections - azure

I have the following data :
From SELECT c.addresses[0] address, [ c.name ] filenames FROM c
[
{
"address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"filenames": [
"File 01.docx"
]
},
{
"address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"filenames": [
"File 02.docx"
]
},
{
"address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"filenames": [
"File 03.docx"
]
}, ....
The address field is the key, I have an index with a field defined as follows :
new Field()
{
Name = "filenames",
Type = DataType.Collection(DataType.String),
IsSearchable = true,
IsFilterable = true,
IsSortable = false,
IsFacetable = false
},
As you can see, I create an array for the filenames with [ c.name ] filenames.
When I index the data displayed above, the index contains one row in the filenames collection, that row is the last one that has been indexed. Can I make it add to the collection (merge) rather than replace?
I am also looking at solving this with the Query, but CosmosDB does not support a subselect (yet) and a UDF can only see the data that's passed into it.

Fundamentally, the way you have structured your Cosmos DB collection makes this scenario unworkable because Azure search does not support merging into a collection.
Consider changing your design to so that address is a key (that is, unique) in the collection, and all filenames are gathered in a single document per address:
{
"address": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"filenames": [ "File 01.docx", "File 02.docx", "File 03.docx", ... ]
}
Also, please add a suggestion on Azure Search UserVoice site to add support for merging collections, which would make your scenario easier to achieve.

Related

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

I have a Cosmos DB with documents that look like the following:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
}
I would like to write a sql query to obtain an entire document using "identifierLabel" as a filter when searching for the document.
I attempted to write a query based on an example I found from the following blog:
SELECT c,t AS identifiers
FROM c
JOIN t in c.identifiers
WHERE t.identifierLabel = "someLabel2"
However, when the result is returned, it appends the following to the end of the document:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
How can I avoid this and get the result that I desire, i.e. the entire document with nothing appended to it?
Thanks in advance.
Using ARRAY_CONTAINS(), you should be able to do something like this to retrieve the entire document, without any need for a self-join:
SELECT *
FROM c
where ARRAY_CONTAINS(c.identifiers, {"identifierLabel":"someLabel2"}, true)
Note that ARRAY_CONTAINS() can search for either scalar values or objects. By specifying true as the third parameter, it signifies searching through objects. So, in the above query, it's searching all objects in the array where identifierLabel is set to "someLabel2" (and then it should be returning the original document, unchanged, avoiding the issue you ran into with the self-join).

CouchDB Mango query - Match any key with array item

I have the following documents:
{
"_id": "doc1"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test2": ["something"]
}
},
},
{
"_id": "doc2"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test3": ["something"]
}
},
}
I need a Mango selector that retrieves documents where any field inside binds (subject, object etc) has an object with key equals to any values from an array passed as parameter. That is, if keys of binds contains any values of some array it should returns that document.
For instance, consider the array ["Test2"] my selector should retrieve doc1 since binds["subject"]["Test1"] exists; the array ["Test1"] should retrieve doc1 and doc2 and the array ["Test2", "Test3"] should also retrieve doc1 and doc2.
F.Y.I. I am using Node.js with nano lib to access CouchDB API.
I am providing this answer because the luxury of altering document "schema" is not always an option.
With the given document structure this cannot be done with Mango in any reasonable manner. Yes, it can be done, but only when employing very brittle and inefficient practices.
Mango does not provide an efficient means of querying documents for dynamic properties; it does support searching within property values e.g. arrays1.
Using worst practices, this selector will find docs with binds properties subject and object having properties named Test2 and Test3
{
"selector": {
"$or": [
{
"binds.subject.Test2": {
"$exists": true
}
},
{
"binds.object.Test2": {
"$exists": true
}
},
{
"binds.subject.Test3": {
"$exists": true
}
},
{
"binds.object.Test3": {
"$exists": true
}
}
]
}
}
Yuk.
The problems
The queried property names vary so a Mango index cannot be leveraged (Test37 anyone?)
Because of (1) a full index scan (_all_docs) occurs every query
Requires programmatic generation of the $or clause
Requires a knowledge of the set of property names to query (Test37 anyone?)
The given document structure is a show stopper for a Mango index and query.
This is where map/reduce shines
Consider a view with the map function
function (doc) {
for(var prop in doc.binds) {
if(doc.binds.hasOwnProperty(prop)) {
// prop = subject, object, foo, bar, etc
var obj = doc.binds[prop];
for(var objProp in obj) {
if(obj.hasOwnProperty(objProp)) {
// objProp = Test1, Test2, Test37, Fubar, etc
emit(objProp,prop)
}
}
}
}
}
So the map function creates a view for any docs with a binds property with two nested properties, e.g. binds.subject.Test1, binds.foo.bar.
Given the two documents in the question, this would be the basic view index
id
key
value
doc1
Test1
subject
doc2
Test1
subject
doc1
Test2
object
doc2
Test3
object
And since view queries provide the keys parameter, this query would provide your specific solution using JSON
{
include_docs: true,
reduce: false,
keys: ["Test2","Test3"]
}
Querying that index with cUrl
$ curl -G http://{view endpoint} -d 'include_docs=false' -d
'reduce=false' -d 'keys=["Test2","Test3"]'
would return
{
"total_rows": 4,
"offset": 2,
"rows": [
{
"id": "doc1",
"key": "Test2",
"value": "object"
},
{
"id": "doc2",
"key": "Test3",
"value": "object"
}
]
}
Of course there are options to expand the form and function of such a view by leveraging collation and complex keys, and there's the handy reduce feature.
I've seen commentary that Mango is great for those new to CouchDB due to it's "ease" in creating indexes and the query options, and that map/reduce if for the more seasoned. I believe such comments are well intentioned but misguided; Mango is alluring but has its pitfalls1. Views do require considerable thought, but hey, that's we're supposed to be doing anyway.
1) $elemMatch for example require in memory scanning which can be very costly.

Indexing a JSON file with multiple documents in elastic search

I am new to elasticsearch i want to index a JSON file and perform search queries from elasticsearch
How can I index this json and perform queries to get value if i pass parameter as "field3.innerfield" : "someval"
I have tried indexing this file with helpers.bulk and search but it returns all the fields instead of a selected field.
Below is the JSON sample
[
{
"id": "someid",
"metadata": {
"docType": "value",
"otherfield": " ",
morefields
.
.
},
"field1":"value1",
"field2":"value2,
"field3": [
{
"innerfield": "someval",
"innerfield1":[
"kind of a paragraph"
]
}
],
"field4": [
{
"innerfield": "someval",
"innerfield1": "kind of a paragraph"
}
],
},
{ again the format repeats with different id but same fields
},
{
}
]
Your question lacks clarity however what I understood is that you want to fetch values from its key for a nested json. You can do that in the following way as shown below.
Parse it multiple times and make the required changes as per your need.
import json
data = data.apply(lambda x: json.loads(json.loads(x).get("metadata","{}")).get("doctype") if x else None)

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

How to do field mapping in azure search for complex json objects for example nested array

I have following problem
I have a field mapping update to an index .Payload is complex where
I have:
{
"type": "abc",
"Party": [{
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"Type": "Office",
"Number": "12345"
}]
}]
}
And now I want to create a field for an index. The field name is phonenumber of type Collection(Edm.String)
where mapping is
{
"sourceFieldName" : "/Party/Phone/Number",
"targetFieldName" : "phonenumber",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
In http post body
But still after indexing i get phone number result as null.That means the mapping went wrong.If you see the phone number in source json, it is inside a json array and it itself is an array and result needs to get stored inside a collection of a string.Is it possible how can I achieve this?
If this is not possible I atleast want field mapping till phone array ie., /Party/Phone/
If i index complete party array as a text, I get an error while running the index saying:
"Field 'partydetails' contains a term that is too large to process. The max length for UTF-8 encoded terms is 32766 bytes. The most likely cause of this error is that filtering, sorting, and/or faceting are enabled on this field, which causes the entire field value to be indexed as a single term. Please avoid the use of these options for large fields."
Can someone please help!
If party would have been a Json object than an array and phone would have been only a string array for example
{
"type": "abc",
"Party": {
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"12345",
"23463"
}]
}
}
Then I could have mapped
{
"sourceFieldName" : "Party/Phonenumber",
"targetFieldName" : "phonenumbers",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
It map as collection of type odata EDM.string.
So to put this in better and straight forward way,
Either transform your json to something flatter (the example that I
gave above) or
Use the proper index incase if you know before inhand as
#Luis Cabrera said,
“sourceFieldName”: “/Party/0/Phone/0/Type
It is a limitation from azure search side.
Note that Party and Phone are arrays, so the field mapping you mention won't work.
You will need to index into the specific element. For example:
{
"sourceFieldName": "/Party/0/Phone/0/Type",
"targetFieldName": "firstPhoneNumberTypeOfFirstParty"
}
You may want to give that a shot.
Thanks!
Luis Cabrera | Program Manager | Azure Search

Resources