Not Getting the Shape Right in DocumentDb Select - azure

I'm trying to get only the person's membership info i.e. ID, name and committee memberships in a SELECT query. This is my object:
{
"id": 123,
"name": "John Smith",
"memberships": [
{
"id": 789,
"name": "U.S. Congress",
"yearElected": 2012,
"state": "California",
"committees": [
{
"id": 444,
"name": "Appropriations Comittee",
"position": "Member"
},
{
"id": 555,
"name": "Armed Services Comittee",
"position": "Chairman"
},
{
"id": 678,
"name": "Veterans' Affairs Comittee",
"position": "Member"
}
]
}
]
}
In this example, John Smith is a member of the U.S. Congress and three committees in it.
The result that I'm trying to get should look like this. Again, this is the "DESIRED RESULT":
{
"id": 789,
"name": "U.S. Congress",
"committees": [
{
"id": 444,
"name": "Appropriations Committee",
"position": "Member"
},
{
"id": 555,
"name": "Armed Services Committee",
"position": "Chairman"
},
{
"id": 678,
"name": "Veterans' Affairs Committee",
"position": "Member"
}
]
}
Here's my SQL query:
SELECT m.id, m.name,
[
{
"id": c.id,
"name": c.name,
"position": c.position
}
] AS committees
FROM a
JOIN m IN a.memberships
JOIN c IN m.committees
WHERE a.id = "123"
I'm getting the following results which is correct but the shape is not right. I'm getting the same membership 3 times. Here's what I'm getting which is NOT the desired result:
[
{
"id": 789,
"name": "U.S. Congress",
"committees":[
{
"id": 444,
"name": "Appropriations Committee",
"position": "Member"
}
]
},
{
"id": 789,
"name": "U.S. Congress",
"committees":[
{
"id": 555,
"name": "Armed Services Committee",
"position": "Chairman"
}
]
},
{
"id": 789,
"name": "U.S. Congress",
"committees":[
{
"id": 678,
"name": "Veterans' Affairs Committee",
"position": "Member"
}
]
}
]
As you can see here, the "U.S. Congress" membership is repeated 3 times.
The following SQL query gets me exactly what I want in Azure Query Explorer but when I pass it as the query in my code -- using DocumentDb SDK -- I don't get any of the details for the committees. I simply get blank results for committee ID, name and position. I do, however, get the membership data i.e. "U.S. Congress", etc. Here's that SQL query:
SELECT m.id, m.name, m.committees AS committees
FROM c
JOIN m IN c.memberhips
WHERE c.id = 123
I'm including the code that makes the DocumentDb call. I'm including the code with our internal comments to help clarify their purpose:
First the ReadQuery function that we call whenever we need to read something from DocumentDb:
public async Task<IEnumerable<T>> ReadQuery<T>(string collectionId, string sql, Dictionary<string, object> parameterNameValueCollection)
{
// Prepare collection self link
var collectionLink = UriFactory.CreateDocumentCollectionUri(_dbName, collectionId);
// Prepare query
var query = getQuery(sql, parameterNameValueCollection);
// Creates the query and returns IQueryable object that will be executed by the calling function
var result = _client.CreateDocumentQuery<T>(collectionLink, query, null);
return await result.QueryAsync();
}
The following function prepares the query -- with any parameters:
protected SqlQuerySpec getQuery(string sql, Dictionary<string, object> parameterNameValueCollection)
{
// Declare query object
SqlQuerySpec query = new SqlQuerySpec();
// Set query text
query.QueryText = sql;
// Convert parameters received in a collection to DocumentDb paramters
if (parameterNameValueCollection != null && parameterNameValueCollection.Count > 0)
{
// Go through each item in the parameters collection and process it
foreach (var item in parameterNameValueCollection)
{
query.Parameters.Add(new SqlParameter($"#{item.Key}", item.Value));
}
}
return query;
}
This function makes async call to DocumentDb:
public async static Task<IEnumerable<T>> QueryAsync<T>(this IQueryable<T> query)
{
var docQuery = query.AsDocumentQuery();
// Batches gives us the ability to read data in chunks in an asyc fashion.
// If we use the ToList<T>() LINQ method to read ALL the data, the call will synchronous which is why we prefer the batches approach.
var batches = new List<IEnumerable<T>>();
do
{
// Actual call is made to the backend DocumentDb database
var batch = await docQuery.ExecuteNextAsync<T>();
batches.Add(batch);
}
while (docQuery.HasMoreResults);
// Because batches are collections of collections, we use the following line to merge all into a single collection.
var docs = batches.SelectMany(b => b);
// Return data
return docs;
}

I just write a demo to test with your query and I can get the expected result, check the snapshot below. So I think that query is correct, you've mentioned that you don't seem to get any data when you make the call in my code, would you mind share your code? Perhaps there are some mistakes in you code. Anyway, here is my test just for your reference and hope it helps.
Query used:
SELECT m.id AS membershipId, m.name AS membershipNameName, m.committees AS committees
FROM c
JOIN m IN c.memberships
WHERE c.id = "123"
Code here is very simple, sp_db.innerText represents a span which I used to show the result in my test page:
var docs = client.CreateDocumentQuery("dbs/" + databaseId + "/colls/" + collectionId,
"SELECT m.id AS membershipId, m.name AS membershipName, m.committees AS committees " +
"FROM c " +
"JOIN m IN c.memberships " +
"WHERE c.id = \"123\"");
foreach (var doc in docs)
{
sp_db.InnerText += doc;
}
I think maybe there are some typos in the query you specified in client.CreateDocumentQuery() which makes the result to be none, it's better to provide the code for us, then we can help check it.
Updates:
Just tried your code and still I can get the expected result. One thing I found is that when I specified the where clause like "where c.id = \"123\"", it gets the result:
However, if you didn't make the escape and just use "where c.id = 123", this time you get nothing. I think this could be a reason. You can verify whether you have ran into this scenario.

Just updated my original post. All the code provided in the question is correct and works. I was having a problem because I was using aliases in the SELECT query and as a result some properties were not binding to my domain object.
The code provided in the question is correct.

Related

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

Unable to map nested datasource field of cosmos db to a root index field of Azure indexer using REST APIs

I have a mongo db collection users with the following data format
{
"name": "abc",
"email": "abc#xyz.com"
"address": {
"city": "Gurgaon",
"state": "Haryana"
}
}
Now I'm creating a datasource, an index, and an indexer for this collection using azure rest apis.
Datasource
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>"
},
"container": {"name": "users"},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)
Index for the above datasource
def create_index(config):
request_body = {
'name': "users-index",
'fields': [
{
'name': 'name',
'type': 'Edm.String'
},
{
'name': 'email',
'type': 'Edm.DateTimeOffset'
},
{
'name': 'address',
'type': 'Edm.String'
},
{
'name': 'doc_id',
'type': 'Edm.String',
'key': True
}
]
}
resp = requests.post(url="<azure-create-index-api-url>", data=json.dumps(request_body),
headers=config.headers)
Now the inxder for the above datasource and index
def create_interviews_indexer(config):
request_body = {
"name": "users-indexer",
"dataSourceName": "users-datasource",
"targetIndexName": users-index,
"schedule": {"interval": "PT5M"},
"fieldMappings": [
{"sourceFieldName": "address.city", "targetFieldName": "address"},
]
}
resp = requests.post("create-indexer-pi-url", data=json.dumps(request_body),
headers=config.headers)
This creates the indexer without any exception, but when I check the retrieved data in azure portal for the users-indexer, the address field is null and is not getting any value from address.city field mapping that is provided while creating the indexer.
I have also tried the following code as a mapping but its also not working.
"fieldMappings": [
{"sourceFieldName": "/address/city", "targetFieldName": "address"},
]
The azure documentation also does not say anything about this kind of mapping. So if anyone can help me on this, it will be very much appreciated.
container element in data source definition allows you to specify a query that you can use to flatten your JSON document (Ref: https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source) so instead of doing column mapping in the indexer definition, you can write a query and get the output in desired format.
Your code for creating data source in that case would be:
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>",
},
"container": {
"name": "users",
"query": "SELECT a.name, a.email, a.address.city as address FROM a",
},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)
Support for MongoDb API flavor is in public preview - you need to explicitly indicate Mongo in the datasource's connection string as described in this article. Also note that with Mongo datasources, custom queries suggested by the previous response are not supported afaik. Hopefully someone from the team would clarify the current state of this support.
It's working for me with the below field mapping correctly. Azure search query is returning values for address properly.
"fieldMappings": [{"sourceFieldName": "address.city", "targetFieldName": "address"}]
I did made few changes to the data your provided for e.g.
while creating indexers, removed extra comma at the end of
fieldmappings
while creating index, email field is kept at
Edm.String and not datetimeoffset.
Please make sure you are using the Preview API version since for MongoDB API is in preview mode with Azure Search.
For e.g. https://{azure search name}.search.windows.net/indexers?api-version=2019-05-06-Preview

How to match and join results between two resolvers in one graphql query?

I have two resolver.
The one is Company resolve that return the company details like id, name and list of documents ids, like this example:
{
"data": {
"companyOne": {
"name": "twitter",
"documents": [
"5c6c0213f0fa854bd7d4a38c",
"5c6c02948e0001a16529a1a1",
"5c6c02ee7e76c12075850119",
"5c6ef2ddd16e19889ffaffd0",
"5c72fb723ebf7b2881679ced",
"5c753d1c2e080fa4a2f86c87",
...
]
}
}
}
And the another resolver gets me all the details of documents like this example:
{
"data": {
"documentsMany": [{
"name": "doc1",
"_id": 5c6c0213f0fa854bd7d4a38c,
}, {
"name": "doc2",
"_id": 5c6c02948e0001a16529a1a1,
},
...
]
}
}
How to match every data.companyOne.documents[id] to data.documentsMany[..]._id? in the query level? is it possible to do this graphql?
The expect results should be when I run the companyOne query (without change the code - just in the query level) it's should return with documents as object instead of array of string ids.
maybe something like?
query {
companyOne {
name,
documents on documentsMany where _id is ___???
}
}

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Performing a query on the lowest level of a tree-structured Dojo store

Let's say we have a nested data structure like so:
[
{
"name": "fruits",
"items": [
{ "name": "apple" ...}
{ "name": "lemon" ...}
{ "name": "peach" ...}
]
}
{
"name": "veggies",
"items": [
{ "name": "carrot" ...}
{ "name": "cabbage" ...}
]
}
{
"name": "meat",
"items": [
{ "name": "steak" ...}
{ "name": "pork" ...}
]
}
]
The above data is placed in a dojo/store/Memory. I want to perform a query for items that contain the letter "c", but only on the lower level (don't want to query the categories).
With a generic dojo/store/Memory, it's query function only applies a filter on the top level, so the code
store.query(function(item) {
return item.name.indexOf("c") != -1;
});
will only perform the query on the category names (fruits, veggies, etc) instead of the actual items.
Is there a straight-forward way to perform this query on the child nodes, and if there's a match, return all children as well as the parent? For instance, the "c" query would return the "fruits" node with it's "peach" child only, "veggies" would remain intact, and "meat" would be left out of the query results entirely.
You can of course define your own checking method in the store's query method. I don't check if this code runs perfectly, but I guess you could pretty much get what it's meant to do.
store.query(function(item) {
var found = {
name: "",
items: []
};
var children = item.items;
d_array.forEach(children, function(child) {
if (child.name.indexOf("c") != -1) {
found.name = item.name;
found.items.push(child);
}
});
return found;
});
Hope this helps.

Resources