Azure search index not updating field - azure

I have two indexes, index1 is the old and currently used index and the new index2 contains additionally a new string array field myArray1.
Azure Search is using documentdb collection as a source and myArray1 is filled out properly there. However when querying the document in the Azure Search Explorer myArray1 is always empty. The search explorer is set to index2. I also tried resetting index2 but without luck.
I am using a CreateDataSource.json to define the query for the documentdb collection. In this query I am selecting the prop myArray1.
Any idea why the index is not picking up the values stored in myArray?
Here is the data source query:
SELECT c.id AS Id, c.crew AS Crews, c['cast'] AS Casts FROM c WHERE c._ts >= #HighWaterMark
If I run it against documentdb in Azure search it works fine.
Here is the index definition:
Index definition = new Index()
{
Name = "index-docdb4",
Fields = new[]
{
new Field("Id", DataType.String, AnalyzerName.StandardLucene) { IsKey = true, IsFilterable = true },
new Field("Crews", DataType.Collection(DataType.String)) { IsFilterable = true },
new Field("Casts", DataType.Collection(DataType.String)) { IsFilterable = true }
}
};
Here is the indexer json file
{
"name": "indexer-docdb4",
"dataSourceName": "datasource-docdb",
"targetIndexName": "index-docdb4",
"schedule": {
"interval": "PT5M",
"startTime": "2015-01-01T00:00:00Z"
}
}
Here is a documentdb example file
{
"id": "300627",
"title": "Carmen",
"originalTitle": "Carmen",
"year": 2011,
"genres": [
"Music"
],
"partitionKey": 7,
"_rid": "OsZtAIcaugECAAAAAAAAAA==",
"_self": "dbs/OsZtAA==/colls/OsZtAIcaugE=/docs/OsZtAIcaugECAAAAAAAAAA==/",
"_etag": "\"0400d17e-0000-0000-0000-590a493a0000\"",
"_attachments": "attachments/",
"cast": [
"315986",
"321880",
"603325",
"484671",
"603324",
"734554",
"734555",
"706818",
"711766",
"734556",
"734455"
],
"crew": [
"58185",
"390726",
"302640",
"670953",
"28046",
"122587"
],
"_ts": 1493846327
},

Related

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

Can I index an array in a composite index in Azure Cosmos DB?

I have a problem indexing an array in Azure Cosmos DB
I am trying to save this indexing policy via the portal
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/DeviceId",
"order": "ascending"
},
{
"path": "/TimeStamp",
"order": "ascending"
},
{
"path": "/Items/[]/Name/?",
"order": "ascending"
},
{
"path": "/Items/[]/DoubleValue/?",
"order": "ascending"
}
]
]
}
I get the error "Failed to update container DeviceEvents:
Message: {"code":"BadRequest","message":"Message: {"Errors":["The indexing path '\/Items\/[]\/Name\/?' could not be accepted, failed near position '8'."
This seems to be the array [] syntax that is giving an error.
On a side note I am not sure what I am doing makes sense at all but I have a query that looks like this
SELECT SUM(de0["DoubleValue"])
FROM root JOIN de0 IN root["Items"]
WHERE root["ApplicationId"] = 57 AND root["DeviceId"] = 126 AND root["TimeStamp"] >= "2021-02-21T17:55:29.7389397Z" AND de0["Name"] = "Use Case"
Where ApplicationId is the partition key and the item saved looks like this
{
"id": "59ab9323-26ca-436f-8d29-e1ddd826f025",
"DeviceId": 3,
"ApplicationId": 3,
"RawData": "640F7A000A00E30142000000",
"TimeStamp": "2021-02-20T18:36:52.833174Z",
"Items": [
{
"Name": "Battery Status",
"StringValue": "Full",
"DoubleValue": null
},
{
"Name": "Use Case",
"StringValue": null,
"DoubleValue": 12
},
{
"Name": "Battery Voltage",
"StringValue": null,
"DoubleValue": 3.962
},
{
"Name": "Rain Gauge Count",
"StringValue": null,
"DoubleValue": 10
}
],
"_rid": "CgdVAO7B0DNkAAAAAAAAAA==",
"_self": "dbs/CgdVAA==/colls/CgdVAO7B0DM=/docs/CgdVAO7B0DNkAAAAAAAAAA==/",
"_etag": "\"61008771-0000-0d00-0000-603156c50000\"",
"_attachments": "attachments/",
"_ts": 1613846213
}
I need to aggregate on some of these items in the array like say get MAX on temperature or something like this (using Use Case for test although it doesn't make sense). I reasoned that if all the data in the query is in a single composite index the database would be able to do the aggregation without reading the documents themselves. However I can't seem to add a composite index containing an array at all.
Yes, composite index can't contain an array path. It should be a scalar value.
Unlike with included or excluded paths, you can't create a path with
the /* wildcard. Every composite path has an implicit /? at the end of
the path that you don't need to specify. Composite paths lead to a
scalar value and this is the only value that is included in the
composite index.
Reference:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#composite-indexes

Unable to map nested datasource field of cosmos db to a root index field of Azure indexer using REST APIs

I have a mongo db collection users with the following data format
{
"name": "abc",
"email": "abc#xyz.com"
"address": {
"city": "Gurgaon",
"state": "Haryana"
}
}
Now I'm creating a datasource, an index, and an indexer for this collection using azure rest apis.
Datasource
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>"
},
"container": {"name": "users"},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)
Index for the above datasource
def create_index(config):
request_body = {
'name': "users-index",
'fields': [
{
'name': 'name',
'type': 'Edm.String'
},
{
'name': 'email',
'type': 'Edm.DateTimeOffset'
},
{
'name': 'address',
'type': 'Edm.String'
},
{
'name': 'doc_id',
'type': 'Edm.String',
'key': True
}
]
}
resp = requests.post(url="<azure-create-index-api-url>", data=json.dumps(request_body),
headers=config.headers)
Now the inxder for the above datasource and index
def create_interviews_indexer(config):
request_body = {
"name": "users-indexer",
"dataSourceName": "users-datasource",
"targetIndexName": users-index,
"schedule": {"interval": "PT5M"},
"fieldMappings": [
{"sourceFieldName": "address.city", "targetFieldName": "address"},
]
}
resp = requests.post("create-indexer-pi-url", data=json.dumps(request_body),
headers=config.headers)
This creates the indexer without any exception, but when I check the retrieved data in azure portal for the users-indexer, the address field is null and is not getting any value from address.city field mapping that is provided while creating the indexer.
I have also tried the following code as a mapping but its also not working.
"fieldMappings": [
{"sourceFieldName": "/address/city", "targetFieldName": "address"},
]
The azure documentation also does not say anything about this kind of mapping. So if anyone can help me on this, it will be very much appreciated.
container element in data source definition allows you to specify a query that you can use to flatten your JSON document (Ref: https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source) so instead of doing column mapping in the indexer definition, you can write a query and get the output in desired format.
Your code for creating data source in that case would be:
def create_datasource():
request_body = {
"name": 'users-datasource',
"description": "",
"type": "cosmosdb",
"credentials": {
"connectionString": "<db conenction url>",
},
"container": {
"name": "users",
"query": "SELECT a.name, a.email, a.address.city as address FROM a",
},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
}
}
resp = requests.post(url="<create-datasource-api-url>", data=json.dumps(request_body),
headers=headers)
Support for MongoDb API flavor is in public preview - you need to explicitly indicate Mongo in the datasource's connection string as described in this article. Also note that with Mongo datasources, custom queries suggested by the previous response are not supported afaik. Hopefully someone from the team would clarify the current state of this support.
It's working for me with the below field mapping correctly. Azure search query is returning values for address properly.
"fieldMappings": [{"sourceFieldName": "address.city", "targetFieldName": "address"}]
I did made few changes to the data your provided for e.g.
while creating indexers, removed extra comma at the end of
fieldmappings
while creating index, email field is kept at
Edm.String and not datetimeoffset.
Please make sure you are using the Preview API version since for MongoDB API is in preview mode with Azure Search.
For e.g. https://{azure search name}.search.windows.net/indexers?api-version=2019-05-06-Preview

Filter doc in DynamoDb by nested object list item using node.js

I have a document that has what Dynamodb calls a list.
"sites": [
{
"active": true,
"address": "212 Grand Ave",
"city": "Billings",
"device_id": "161674",
I would like to filter out by the device_id. Mongodb allows this by doing.var query = {"sites.device_id":device_id};
I currently have this:
var params = {
TableName : "customer",
"FilterExpression": "#k_sites[0].#k_device_id = :v_device_id",
"ExpressionAttributeNames": {
"#k_sites": "sites",
"#k_device_id": "device_id"
},
"ExpressionAttributeValues": {
":v_device_id": "161674"
}
However, I don't want to be limited by the first item in the list. Not sure if this is the best way if not would an index be the way to search this item? How would I set up that index?

Range Index pre-existing collection programmatically

I've created a database with a collection. The collection has thousands of pre-existing documents which looks something like below as an example.
{
"Town": "Hull",
"Easting": 364208,
"Northing": 176288,
"Longitude": -2.5168477762,
"Latitude": 51.4844052488,
}
I'm aware that I need to index the database with a range type so I can use the range query & the OrderBy function with my data.
So, how can I range index the pre-existing data programmatically using the .NET SDK?
I've come up with the below code. However, it seems to fail at querying the collection. When I've inserted a breakpoint the 'database' contains null at the point of querying for the collection.
// Create an instance of the DocumentClient.
using (dbClient = new DocumentClient(new Uri(Properties.Settings.Default.EndpointUrl), Properties.Settings.Default.AuthorizationKey))
{
Database database = dbClient.CreateDatabaseQuery().Where
(db => db.Id == Properties.Settings.Default.databaseID).AsEnumerable().FirstOrDefault();
DocumentCollection collection = dbClient.CreateDocumentCollectionQuery(database.SelfLink).Where
(c => c.Id == Properties.Settings.Default.collectionID).ToArray().FirstOrDefault();
// If database type is not null then continue to range index the collection
if (collection != null)
{
stopsCollection.IndexingPolicy.IncludedPaths.Add(
new IncludedPath
{
Path = "/*",
Indexes = new System.Collections.ObjectModel.Collection<Index>
{
new RangeIndex(DataType.String) {Precision = 6},
new RangeIndex(DataType.Number) {Precision = 6}
}
}
);
}
else
{
Console.WriteLine(">> Unable to retrieve requested collection.");
}
}
Today, indexing policies are immutable; so you will need to re-create a collection to change the index policy (e.g. add a range index).
If you wanted create a collection with a custom index policy programatically, the code to do this would look something like this:
var rangeDefault = new DocumentCollection { Id = "rangeCollection" };
rangeDefault.IndexingPolicy.IncludedPaths.Add(
new IncludedPath {
Path = "/*",
Indexes = new Collection<Index> {
new RangeIndex(DataType.String) { Precision = -1 },
new RangeIndex(DataType.Number) { Precision = -1 }
}
});
await client.CreateDocumentCollectionAsync(database.SelfLink, rangeDefault);
And then write some code to reads data from the existing collection and writes the data over to your new collection.
But this is a bit cumbersome...
As an alternative solution... I would highly suggest using the DocumentDB Data Migration Tool to create a new collection with your new index policy and move data from your old collection to the new collection. You can delete the old collection once the migration completes successfully.
You can download the data migration tool here.
Step 1: Define DocumentDB as source:
Step 2: Define DocumentDB as the target, and use a new indexing policy:
Hint: you can right click in the indexing policy input box to choose an indexing policy
which will give you an indexing policy that looks something like this:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
},
{
"path": "/_ts/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
Step 3: Run the import job...
Reminder: Delete the old collection after the import finishes successfully.

Resources