Unable to map nested datasource field of cosmos db to a root index field of Azure indexer using REST APIs - azure

I have a mongo db collection users with the following data format
{
"name": "abc",
"email": "abc#xyz.com"
"address": {
"city": "Gurgaon",
"state": "Haryana"
}
}
Now I'm creating a datasource, an index, and an indexer for this collection using azure rest apis.
Datasource
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>"
},
"container": {"name": "users"},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)
Index for the above datasource
def create_index(config):
request_body = {
'name': "users-index",
'fields': [
{
'name': 'name',
'type': 'Edm.String'
},
{
'name': 'email',
'type': 'Edm.DateTimeOffset'
},
{
'name': 'address',
'type': 'Edm.String'
},
{
'name': 'doc_id',
'type': 'Edm.String',
'key': True
}
]
}
resp = requests.post(url="<azure-create-index-api-url>", data=json.dumps(request_body),
headers=config.headers)
Now the inxder for the above datasource and index
def create_interviews_indexer(config):
request_body = {
"name": "users-indexer",
"dataSourceName": "users-datasource",
"targetIndexName": users-index,
"schedule": {"interval": "PT5M"},
"fieldMappings": [
{"sourceFieldName": "address.city", "targetFieldName": "address"},
]
}
resp = requests.post("create-indexer-pi-url", data=json.dumps(request_body),
headers=config.headers)
This creates the indexer without any exception, but when I check the retrieved data in azure portal for the users-indexer, the address field is null and is not getting any value from address.city field mapping that is provided while creating the indexer.
I have also tried the following code as a mapping but its also not working.
"fieldMappings": [
{"sourceFieldName": "/address/city", "targetFieldName": "address"},
]
The azure documentation also does not say anything about this kind of mapping. So if anyone can help me on this, it will be very much appreciated.

container element in data source definition allows you to specify a query that you can use to flatten your JSON document (Ref: https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source) so instead of doing column mapping in the indexer definition, you can write a query and get the output in desired format.
Your code for creating data source in that case would be:
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>",
},
"container": {
"name": "users",
"query": "SELECT a.name, a.email, a.address.city as address FROM a",
},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)

Support for MongoDb API flavor is in public preview - you need to explicitly indicate Mongo in the datasource's connection string as described in this article. Also note that with Mongo datasources, custom queries suggested by the previous response are not supported afaik. Hopefully someone from the team would clarify the current state of this support.

It's working for me with the below field mapping correctly. Azure search query is returning values for address properly.
"fieldMappings": [{"sourceFieldName": "address.city", "targetFieldName": "address"}]
I did made few changes to the data your provided for e.g.
while creating indexers, removed extra comma at the end of
fieldmappings
while creating index, email field is kept at
Edm.String and not datetimeoffset.
Please make sure you are using the Preview API version since for MongoDB API is in preview mode with Azure Search.
For e.g. https://{azure search name}.search.windows.net/indexers?api-version=2019-05-06-Preview

Related

Not able to populate all data by Meilisearch pluging using populateEntryRule in my strapi project

I am trying to integrate meilisearch functionality in my Strapi Project. While integrating and testing meilisearch I found that the plugin is failing to fetch all the data, especially when the data is located inside a nested array. The plugin successfully fetches the data from the first level of nested array, rest of the datas are not being fetched by meilisearch plugin. I have tried using the populateEntryRule on plugin.js file which I found from the meilisearch documentation but it's not giving the intended result. When I fire an API call to that particular collection type through Strapi(not through meilisearch) am getting all the data correctly.
Here is my particular schema that I am working with:
{
"kind": "collectionType",
"collectionName": "investors",
"info": {
"singularName": "investor",
"pluralName": "investors",
"displayName": "Investor",
"description": ""
},
"options": {
"draftAndPublish": true
},
"pluginOptions": {},
"attributes": {
"title": {
"type": "string"
},
"slug": {
"type": "uid",
"targetField": "title"
},
"metaData": {
"type": "component",
"repeatable": false,
"component": "seo.meta-fields"
},
"pageSections": {
"type": "dynamiczone",
"components": [
"sections.hero",
"sections.certification-section",
"sections.community-section",
"sections.content-section",
"sections.home-page-section-2",
"sections.lm-business-modal-section",
"sections.lm-evolving-section",
"sections.lm-leardership-section",
"sections.milestone-section",
"sections.mission-section",
"sections.our-business-section",
"sections.product-hero-section",
"sections.statistics-section",
"sections.team-section",
"sections.value-section",
"sections.vision-section",
"sections.webcast-section",
"sections.chart-section",
"sections.key-financials",
"sections.fixed-deposit",
"sections.financial-performance"
]
},
"Accordion": {
"type": "dynamiczone",
"components": [
"elements.no-dropdown",
"elements.dropdown"
]
}
}
}
My custom meilisearch plugin functionality for populating the required fields in plugin.js file.
module.exports = {
meilisearch: {
config: {
investor: {
populateEntryRule:['dynamiczone.pageSections', 'pageSections']
}
}
}
}
For better understanding am attaching a SS of Strapi API call to an investor's collection type naming Financial Performance.
SS of Meilisearch result for the same.
Any help in resolving this would be highly appreciable

How to create a field mapping in Azure Search with a complex targetField

I use the Azure Search indexer to index documents from a MongoDB CosmosDB which contains objects with fields named _id.
As Azure Search does not allow underscores at the beginning of a field name in the index, I want to create a field mapping.
JSON structure in Cosmos --> structure in index
{
"id": "test"
"name": "test",
"productLine": {
"_id": "123", --> "id": "123"
"name": "test"
}
}
The documentation has exactly this scenario as an example but only for a top level field.
"fieldMappings" : [ { "sourceFieldName" : "_id", "targetFieldName" : "id" } ]}
I tried the following:
"fieldMappings" : [ { "sourceFieldName" : "productLine/_id", "targetFieldName" : "productLine/id" } ] }
that results in an error stating:
Value is not accepted. Valid values: "doc_id", "name", "productName".
What is the correct way to create a mapping for a target field that is a subfield?
It's not possible to directly map subfields. You can get around this by adding a Skillset with a Shaper cognitive skill to the indexer, and an output field mapping.
You will also want to attach a Cognitive Services resource to the skillset. The shaper skill doesn't get billed, but attaching a Cognitive Services resource allows you to process more than 20 documents per day.
Shaper skill
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "id",
"source": "/document/productLine/_id"
},
{
"name": "name",
"source": "/document/productLine/name"
}
],
"outputs": [
{
"name": "output",
"targetName": "renamedProductLine"
}
]
}
Indexer skillset and output field mapping
"skillsetName": <skillsetName>,
"outputFieldMappings": [
{
"sourceFieldName": "/document/renamedProductLine",
"targetFieldName": "productLine"
}
]

How to include fields in api server and remove it before returning to results to client in Graphql

I have a Node.js GraphQL server. From the client, I am trying get all the user entries using a query like this:
{
user {
name
entries {
title
body
}
}
}
In the Node.js GraphQL server, however I want to return user entries that are currently valid based on publishDate and expiryDate in the entries object.
For example:
{
"user": "john",
"entries": [
{
"title": "entry1",
"body": "body1",
"publishDate": "2019-02-12",
"expiryDate": "2019-02-13"
},
{
"title": "entry2",
"body": "body2",
"publishDate": "2019-02-13",
"expiryDate": "2019-03-01"
},
{
"title": "entry3",
"body": "body3",
"publishDate": "2020-01-01",
"expiryDate": "2020-01-31"
}
]
}
should return this
{
"user": "john",
"entries": [
{
"title": "entry2",
"body": "body2",
"publishDate": "2019-02-13",
"expiryDate": "2019-03-01"
}
]
}
The entries is fetched via a delegateToSchema call (https://www.apollographql.com/docs/graphql-tools/schema-delegation.html#delegateToSchema) and I don't have an option to pass publishDate and expiryDate as query parameters. Essentially, I need to get the results and then filter them in memory.
The issue I face is that the original query doesn't have publishDate and expiryDate in it to support this. Is there a way to add these fields to delegateToSchema call and then remove them while sending them back to the client?
You are looking for transformResult
Implementation details are:
At delegateToSchema you need to define transforms array.
At Transform you need to define transformResult function for filtering results.
If you have ability to send arguments to remote GraphQL server, then you should use
transformRequest

Can I get CosmosDB graph to return edge details for vertex objects in query results?

Consider the following simple gremlin query: g.V("some_id")
When executed against my CosmosDB graph database from the "Data Explorer" tab of the Azure web UI, I get the following results:
[{
"id": "some_id",
"label": "some_type
"type": "vertex",
"outE": {
"some_edge": [{
"id": "75b3c6ff-efdf-4a88-8cf6-aa395ef28bf7",
"inV": "another_id"
},
{
"id": "f3703292-12b9-44bc-a16f-26bac75f3420",
"inV": "yet_another_id"
}
]
},
"properties": {
"some_property": [{
"id": "50bda5cb-08ab-4727-b212-5ba4e829db3e|organizationId",
"value": "hi there"
}]
}
}]
When I execute the same exact query against the same exact database using the gremlin websocket endpoint, I get the following results:
[{
"id": "some_id",
"label": "some_type
"type": "vertex",
"properties": {
"some_property": [{
"id": "50bda5cb-08ab-4727-b212-5ba4e829db3e|organizationId",
"value": "hi there"
}]
}
}]
What happened to the edges (the "outE" JSON key)? Only the "properties" key is included, but man, I need those edges! How do I adjust the output format to include them?
This looks like it is an artifact of the way that the data explorer shows and parses the data returned by the underlying engine. Since the edges are not properties of the Vertexes I don't think that these should be included as part of the Vertex returned by the query. If you want to return the vertex and the associated edges you can do that using a query like this which works in the gremlin console and via the driver:
g.V('some-id').as('b').bothE().as('e').select ('b', 'e')

Using Azure CosmosDB DocumentDB API with Graph API

What I would like to be able to do:
Save schemaless JSON to documents
Connect those documents arbitrarily
Get recursive tree of documents based on aforementioned connections, like for example:
{
"name": "Document 1",
"includes": [
{
"name": "Document 2.1"
"includes": [
{
"name": "Document 3",
"includes": []
}
]
},
{
"name": "Document 2.2",
"includes": []
}
]
}
Current status of my setup:
CosmosDB instance configured with Graph (Gremlin) API
Possible to create (JSON) documents through DocumentDB API
Possible to created edges to documents through Graph API
Using Node.js SDKs
Questions:
Is it possible to save JSON objects as vertices through Graph API? It allows creating vertices with g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44).property('userid', 1) but something like g.addV({ firstName: 'Thomas' }) does not seem to work.
If I add documents through DocumentDB API and edges between them through Graph API and traverse through the graph, results only include IDs of the documents, not other properties. Is it possible to populate the documents somehow?
Example traversal query:
g.V('03e0576f-2ff7-6109-5ed5-237b43191354').repeat(out('includes')).until(not(out('includes'))).simplePath().dedup().tree().by('id')
Result from this query:
[{
"03e0576f-2ff7-6109-5ed5-237b43191354": {
"key": "03e0576f-2ff7-6109-5ed5-237b43191354",
"value": {
"7fab4007-c80c-ba21-f5d3-8eb353ea3279": {
"key": "7fab4007-c80c-ba21-f5d3-8eb353ea3279",
"value": {
"eec55fbd-6900-130d-247f-fb437b093711": {
"key": "eec55fbd-6900-130d-247f-fb437b093711",
"value": {}
},
"cfd14a8c-1ac3-6cc3-e2a4-ac3f250478e1": {
"key": "cfd14a8c-1ac3-6cc3-e2a4-ac3f250478e1",
"value": {
"acac136a-3bd4-831c-df6e-e5b95e593b9a": {
"key": "acac136a-3bd4-831c-df6e-e5b95e593b9a",
"value": {}
}
}
}
}
}
}
}
}]
Yes, it is possible to insert documents both through the Graph API and through the Document API. However, Cosmos expects a specific GraphSON format for the documents in order for all of their properties to be picked up during graph traversal.
I'd recommend taking a look at both Vertex Properties and GraphSON from the Tinkerpop documentation to start to get a better idea about these topics.
When adding a document through Gremlin the syntax is a name value comma separated for all properties you want to add. Try this:
g.addV('label', 'human', 'name', 'jesse', 'age', 27)
Now if you go to the Azure portal and execute a SQL query SELECT * FROM c you'll be able to see the format that Cosmos has translated your document into.

Resources