Can I get CosmosDB graph to return edge details for vertex objects in query results? - azure

Consider the following simple gremlin query: g.V("some_id")
When executed against my CosmosDB graph database from the "Data Explorer" tab of the Azure web UI, I get the following results:
[{
"id": "some_id",
"label": "some_type
"type": "vertex",
"outE": {
"some_edge": [{
"id": "75b3c6ff-efdf-4a88-8cf6-aa395ef28bf7",
"inV": "another_id"
},
{
"id": "f3703292-12b9-44bc-a16f-26bac75f3420",
"inV": "yet_another_id"
}
]
},
"properties": {
"some_property": [{
"id": "50bda5cb-08ab-4727-b212-5ba4e829db3e|organizationId",
"value": "hi there"
}]
}
}]
When I execute the same exact query against the same exact database using the gremlin websocket endpoint, I get the following results:
[{
"id": "some_id",
"label": "some_type
"type": "vertex",
"properties": {
"some_property": [{
"id": "50bda5cb-08ab-4727-b212-5ba4e829db3e|organizationId",
"value": "hi there"
}]
}
}]
What happened to the edges (the "outE" JSON key)? Only the "properties" key is included, but man, I need those edges! How do I adjust the output format to include them?

This looks like it is an artifact of the way that the data explorer shows and parses the data returned by the underlying engine. Since the edges are not properties of the Vertexes I don't think that these should be included as part of the Vertex returned by the query. If you want to return the vertex and the associated edges you can do that using a query like this which works in the gremlin console and via the driver:
g.V('some-id').as('b').bothE().as('e').select ('b', 'e')

Related

How to create a field mapping in Azure Search with a complex targetField

I use the Azure Search indexer to index documents from a MongoDB CosmosDB which contains objects with fields named _id.
As Azure Search does not allow underscores at the beginning of a field name in the index, I want to create a field mapping.
JSON structure in Cosmos --> structure in index
{
"id": "test"
"name": "test",
"productLine": {
"_id": "123", --> "id": "123"
"name": "test"
}
}
The documentation has exactly this scenario as an example but only for a top level field.
"fieldMappings" : [ { "sourceFieldName" : "_id", "targetFieldName" : "id" } ]}
I tried the following:
"fieldMappings" : [ { "sourceFieldName" : "productLine/_id", "targetFieldName" : "productLine/id" } ] }
that results in an error stating:
Value is not accepted. Valid values: "doc_id", "name", "productName".
What is the correct way to create a mapping for a target field that is a subfield?
It's not possible to directly map subfields. You can get around this by adding a Skillset with a Shaper cognitive skill to the indexer, and an output field mapping.
You will also want to attach a Cognitive Services resource to the skillset. The shaper skill doesn't get billed, but attaching a Cognitive Services resource allows you to process more than 20 documents per day.
Shaper skill
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "id",
"source": "/document/productLine/_id"
},
{
"name": "name",
"source": "/document/productLine/name"
}
],
"outputs": [
{
"name": "output",
"targetName": "renamedProductLine"
}
]
}
Indexer skillset and output field mapping
"skillsetName": <skillsetName>,
"outputFieldMappings": [
{
"sourceFieldName": "/document/renamedProductLine",
"targetFieldName": "productLine"
}
]

Azure Search match against two properties of the same object

I would like to do a query matches against two properties of the same item in a sub-collection.
Example:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "person.1#xpto.org" },
{ "type": "phone", "value": "555-12345" },
]
}
]
I would like to be able to search by emails than contain xpto.org but,
doing something like the following doesn't work:
search.ismatchscoring('email','contacts/type,','full','all') and search.ismatchscoring('/.*xpto.org/','contacts/value,','full','all')
instead, it will consider the condition in the context of the main object and objects like the following will also match:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "555-12345" },
{ "type": "phone", "value": "person.1#xpto.org" },
]
}
]
Is there any way around this without having an additional field that concatenates type and value?
Just saw the official doc. At this moment, there's no support for correlated search:
This happens because each clause applies to all values of its field in
the entire document, so there's no concept of a "current sub-document
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
and https://learn.microsoft.com/en-us/azure/search/search-query-understand-collection-filters
The solution I've implemented was creating different collections per contact type.
This way I'm able to search directly in, lets say, the email collection without the need for correlated search. It might not be the solution for all cases but it works well in this case.

How to query by array of objects in Contentful

I have an content type entry in Contentful that has fields like this:
"fields": {
"title": "How It Works",
"slug": "how-it-works",
"countries": [
{
"sys": {
"type": "Link",
"linkType": "Entry",
"id": "3S5dbLRGjS2k8QSWqsKK86"
}
},
{
"sys": {
"type": "Link",
"linkType": "Entry",
"id": "wHfipcJS6WUSaKae0uOw8"
}
}
],
"content": [
{
"sys": {
"type": "Link",
"linkType": "Entry",
"id": "72R0oUMi3uUGMEa80kkSSA"
}
}
]
}
I'd like to run a query that would only return entries if they contain a particular country.
I played around with this query:
https://cdn.contentful.com/spaces/aoeuaoeuao/entries?content_type=contentPage&fields.countries=3S5dbLRGjS2k8QSWqsKK86
However get this error:
The equals operator cannot be used on fields.countries.en-AU because it has type Object.
I'm playing around with postman, but will be using the .NET API.
Is it possible to search for entities, and filter on arrays that contain Objects?
Still learning the API, so I'm guessing it should be pretty straight forward.
Update:
I looked at the request the Contentful Web CMS makes, as this functionality is possible there. They use query params like this:
filters.0.key=fields.countries.sys.id&filters.0.val=3S5dbLRGjS2k8QSWqsKK86
However, this did not work in the delivery API, and might only be an internal query format.
Figured this out. I used the following URL:
https://cdn.contentful.com/spaces/aoeuaoeua/entries?content_type=contentPage&fields.countries.sys.id=wHfipcJS6WUSaKae0uOw8
Note the query parameter fields.countries.sys.id

Using Azure CosmosDB DocumentDB API with Graph API

What I would like to be able to do:
Save schemaless JSON to documents
Connect those documents arbitrarily
Get recursive tree of documents based on aforementioned connections, like for example:
{
"name": "Document 1",
"includes": [
{
"name": "Document 2.1"
"includes": [
{
"name": "Document 3",
"includes": []
}
]
},
{
"name": "Document 2.2",
"includes": []
}
]
}
Current status of my setup:
CosmosDB instance configured with Graph (Gremlin) API
Possible to create (JSON) documents through DocumentDB API
Possible to created edges to documents through Graph API
Using Node.js SDKs
Questions:
Is it possible to save JSON objects as vertices through Graph API? It allows creating vertices with g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44).property('userid', 1) but something like g.addV({ firstName: 'Thomas' }) does not seem to work.
If I add documents through DocumentDB API and edges between them through Graph API and traverse through the graph, results only include IDs of the documents, not other properties. Is it possible to populate the documents somehow?
Example traversal query:
g.V('03e0576f-2ff7-6109-5ed5-237b43191354').repeat(out('includes')).until(not(out('includes'))).simplePath().dedup().tree().by('id')
Result from this query:
[{
"03e0576f-2ff7-6109-5ed5-237b43191354": {
"key": "03e0576f-2ff7-6109-5ed5-237b43191354",
"value": {
"7fab4007-c80c-ba21-f5d3-8eb353ea3279": {
"key": "7fab4007-c80c-ba21-f5d3-8eb353ea3279",
"value": {
"eec55fbd-6900-130d-247f-fb437b093711": {
"key": "eec55fbd-6900-130d-247f-fb437b093711",
"value": {}
},
"cfd14a8c-1ac3-6cc3-e2a4-ac3f250478e1": {
"key": "cfd14a8c-1ac3-6cc3-e2a4-ac3f250478e1",
"value": {
"acac136a-3bd4-831c-df6e-e5b95e593b9a": {
"key": "acac136a-3bd4-831c-df6e-e5b95e593b9a",
"value": {}
}
}
}
}
}
}
}
}]
Yes, it is possible to insert documents both through the Graph API and through the Document API. However, Cosmos expects a specific GraphSON format for the documents in order for all of their properties to be picked up during graph traversal.
I'd recommend taking a look at both Vertex Properties and GraphSON from the Tinkerpop documentation to start to get a better idea about these topics.
When adding a document through Gremlin the syntax is a name value comma separated for all properties you want to add. Try this:
g.addV('label', 'human', 'name', 'jesse', 'age', 27)
Now if you go to the Azure portal and execute a SQL query SELECT * FROM c you'll be able to see the format that Cosmos has translated your document into.

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Resources