Range Index pre-existing collection programmatically

Range Index pre-existing collection programmatically - azure

I've created a database with a collection. The collection has thousands of pre-existing documents which looks something like below as an example.
{
"Town": "Hull",
"Easting": 364208,
"Northing": 176288,
"Longitude": -2.5168477762,
"Latitude": 51.4844052488,
}
I'm aware that I need to index the database with a range type so I can use the range query & the OrderBy function with my data.
So, how can I range index the pre-existing data programmatically using the .NET SDK?
I've come up with the below code. However, it seems to fail at querying the collection. When I've inserted a breakpoint the 'database' contains null at the point of querying for the collection.
// Create an instance of the DocumentClient.
using (dbClient = new DocumentClient(new Uri(Properties.Settings.Default.EndpointUrl), Properties.Settings.Default.AuthorizationKey))
{
Database database = dbClient.CreateDatabaseQuery().Where
(db => db.Id == Properties.Settings.Default.databaseID).AsEnumerable().FirstOrDefault();
DocumentCollection collection = dbClient.CreateDocumentCollectionQuery(database.SelfLink).Where
(c => c.Id == Properties.Settings.Default.collectionID).ToArray().FirstOrDefault();
// If database type is not null then continue to range index the collection
if (collection != null)
{
stopsCollection.IndexingPolicy.IncludedPaths.Add(
new IncludedPath
{
Path = "/*",
Indexes = new System.Collections.ObjectModel.Collection<Index>
{
new RangeIndex(DataType.String) {Precision = 6},
new RangeIndex(DataType.Number) {Precision = 6}
}
}
);
}
else
{
Console.WriteLine(">> Unable to retrieve requested collection.");
}
}

Today, indexing policies are immutable; so you will need to re-create a collection to change the index policy (e.g. add a range index).
If you wanted create a collection with a custom index policy programatically, the code to do this would look something like this:
var rangeDefault = new DocumentCollection { Id = "rangeCollection" };
rangeDefault.IndexingPolicy.IncludedPaths.Add(
new IncludedPath {
Path = "/*",
Indexes = new Collection<Index> {
new RangeIndex(DataType.String) { Precision = -1 },
new RangeIndex(DataType.Number) { Precision = -1 }
}
});
await client.CreateDocumentCollectionAsync(database.SelfLink, rangeDefault);
And then write some code to reads data from the existing collection and writes the data over to your new collection.
But this is a bit cumbersome...
As an alternative solution... I would highly suggest using the DocumentDB Data Migration Tool to create a new collection with your new index policy and move data from your old collection to the new collection. You can delete the old collection once the migration completes successfully.
You can download the data migration tool here.
Step 1: Define DocumentDB as source:
Step 2: Define DocumentDB as the target, and use a new indexing policy:
Hint: you can right click in the indexing policy input box to choose an indexing policy
which will give you an indexing policy that looks something like this:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
},
{
"path": "/_ts/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
Step 3: Run the import job...
Reminder: Delete the old collection after the import finishes successfully.

Related

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

I have a Cosmos DB with documents that look like the following:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
}
I would like to write a sql query to obtain an entire document using "identifierLabel" as a filter when searching for the document.
I attempted to write a query based on an example I found from the following blog:
SELECT c,t AS identifiers
FROM c
JOIN t in c.identifiers
WHERE t.identifierLabel = "someLabel2"
However, when the result is returned, it appends the following to the end of the document:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
How can I avoid this and get the result that I desire, i.e. the entire document with nothing appended to it?
Thanks in advance.

Using ARRAY_CONTAINS(), you should be able to do something like this to retrieve the entire document, without any need for a self-join:
SELECT *
FROM c
where ARRAY_CONTAINS(c.identifiers, {"identifierLabel":"someLabel2"}, true)
Note that ARRAY_CONTAINS() can search for either scalar values or objects. By specifying true as the third parameter, it signifies searching through objects. So, in the above query, it's searching all objects in the array where identifierLabel is set to "someLabel2" (and then it should be returning the original document, unchanged, avoiding the issue you ran into with the self-join).

Graph DB Gremlin query for nested properties

I am storing the below data in azure cosmos graph db.
"properties": {
"A": {
"value": "prop1 new value"
},
"settings": {
"DigitalInput": {
"Input1": {
"nTransIn1": {
"tagName": {
"value": ""
}
}
},
"Input2": {
"nTransIn2": {
"tagName": {
"value": ""
}
}
}
When I am trying to query by single property
g.V().has('s_objectId',within('9d8cf5c6-7b5f-4d0b-af70-bf516f219d73')).
valueMap("p_A")
is giving expected output but how to retrieve with property 'settings' which has nested properties. When I try
g.V().has('s_objectId',within('9d8cf5c6-7b5f-4d0b-af70-bf516f219d73')).
valueMap("p_settings")
it is not giving the correct output as the setting property is stored like below in graph database
"p_settings.DigitalInput.Input1.nTransIn1.tagName": [
{
"id": "6057e448-a2e8-48e4-820f-5396003bdcae",
"value": ""
}
],

Your queries and sample data seem to use different field names. It would be helpful if you could add to the question an addV step that creates the structure you are using in a way that can be tested with TinkerGraph.
In general with Gremlin, the way to access map structures is to select your way into it. Something like
valueMap("p_A").select("p_settings")

CouchDB Mango query - Match any key with array item

I have the following documents:
{
"_id": "doc1"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test2": ["something"]
}
},
},
{
"_id": "doc2"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test3": ["something"]
}
},
}
I need a Mango selector that retrieves documents where any field inside binds (subject, object etc) has an object with key equals to any values from an array passed as parameter. That is, if keys of binds contains any values of some array it should returns that document.
For instance, consider the array ["Test2"] my selector should retrieve doc1 since binds["subject"]["Test1"] exists; the array ["Test1"] should retrieve doc1 and doc2 and the array ["Test2", "Test3"] should also retrieve doc1 and doc2.
F.Y.I. I am using Node.js with nano lib to access CouchDB API.

I am providing this answer because the luxury of altering document "schema" is not always an option.
With the given document structure this cannot be done with Mango in any reasonable manner. Yes, it can be done, but only when employing very brittle and inefficient practices.
Mango does not provide an efficient means of querying documents for dynamic properties; it does support searching within property values e.g. arrays1.
Using worst practices, this selector will find docs with binds properties subject and object having properties named Test2 and Test3
{
"selector": {
"$or": [
{
"binds.subject.Test2": {
"$exists": true
}
},
{
"binds.object.Test2": {
"$exists": true
}
},
{
"binds.subject.Test3": {
"$exists": true
}
},
{
"binds.object.Test3": {
"$exists": true
}
}
]
}
}
Yuk.
The problems
The queried property names vary so a Mango index cannot be leveraged (Test37 anyone?)
Because of (1) a full index scan (_all_docs) occurs every query
Requires programmatic generation of the $or clause
Requires a knowledge of the set of property names to query (Test37 anyone?)
The given document structure is a show stopper for a Mango index and query.
This is where map/reduce shines
Consider a view with the map function
function (doc) {
for(var prop in doc.binds) {
if(doc.binds.hasOwnProperty(prop)) {
// prop = subject, object, foo, bar, etc
var obj = doc.binds[prop];
for(var objProp in obj) {
if(obj.hasOwnProperty(objProp)) {
// objProp = Test1, Test2, Test37, Fubar, etc
emit(objProp,prop)
}
}
}
}
}
So the map function creates a view for any docs with a binds property with two nested properties, e.g. binds.subject.Test1, binds.foo.bar.
Given the two documents in the question, this would be the basic view index
id
key
value
doc1
Test1
subject
doc2
Test1
subject
doc1
Test2
object
doc2
Test3
object
And since view queries provide the keys parameter, this query would provide your specific solution using JSON
{
include_docs: true,
reduce: false,
keys: ["Test2","Test3"]
}
Querying that index with cUrl
$ curl -G http://{view endpoint} -d 'include_docs=false' -d
'reduce=false' -d 'keys=["Test2","Test3"]'
would return
{
"total_rows": 4,
"offset": 2,
"rows": [
{
"id": "doc1",
"key": "Test2",
"value": "object"
},
{
"id": "doc2",
"key": "Test3",
"value": "object"
}
]
}
Of course there are options to expand the form and function of such a view by leveraging collation and complex keys, and there's the handy reduce feature.
I've seen commentary that Mango is great for those new to CouchDB due to it's "ease" in creating indexes and the query options, and that map/reduce if for the more seasoned. I believe such comments are well intentioned but misguided; Mango is alluring but has its pitfalls1. Views do require considerable thought, but hey, that's we're supposed to be doing anyway.
1) $elemMatch for example require in memory scanning which can be very costly.

Filter doc in DynamoDb by nested object list item using node.js

I have a document that has what Dynamodb calls a list.
"sites": [
{
"active": true,
"address": "212 Grand Ave",
"city": "Billings",
"device_id": "161674",
I would like to filter out by the device_id. Mongodb allows this by doing.var query = {"sites.device_id":device_id};
I currently have this:
var params = {
TableName : "customer",
"FilterExpression": "#k_sites[0].#k_device_id = :v_device_id",
"ExpressionAttributeNames": {
"#k_sites": "sites",
"#k_device_id": "device_id"
},
"ExpressionAttributeValues": {
":v_device_id": "161674"
}
However, I don't want to be limited by the first item in the list. Not sure if this is the best way if not would an index be the way to search this item? How would I set up that index?

Azure search index not updating field

I have two indexes, index1 is the old and currently used index and the new index2 contains additionally a new string array field myArray1.
Azure Search is using documentdb collection as a source and myArray1 is filled out properly there. However when querying the document in the Azure Search Explorer myArray1 is always empty. The search explorer is set to index2. I also tried resetting index2 but without luck.
I am using a CreateDataSource.json to define the query for the documentdb collection. In this query I am selecting the prop myArray1.
Any idea why the index is not picking up the values stored in myArray?
Here is the data source query:
SELECT c.id AS Id, c.crew AS Crews, c['cast'] AS Casts FROM c WHERE c._ts >= #HighWaterMark
If I run it against documentdb in Azure search it works fine.
Here is the index definition:
Index definition = new Index()
{
Name = "index-docdb4",
Fields = new[]
{
new Field("Id", DataType.String, AnalyzerName.StandardLucene) { IsKey = true, IsFilterable = true },
new Field("Crews", DataType.Collection(DataType.String)) { IsFilterable = true },
new Field("Casts", DataType.Collection(DataType.String)) { IsFilterable = true }
}
};
Here is the indexer json file
{
"name": "indexer-docdb4",
"dataSourceName": "datasource-docdb",
"targetIndexName": "index-docdb4",
"schedule": {
"interval": "PT5M",
"startTime": "2015-01-01T00:00:00Z"
}
}
Here is a documentdb example file
{
"id": "300627",
"title": "Carmen",
"originalTitle": "Carmen",
"year": 2011,
"genres": [
"Music"
],
"partitionKey": 7,
"_rid": "OsZtAIcaugECAAAAAAAAAA==",
"_self": "dbs/OsZtAA==/colls/OsZtAIcaugE=/docs/OsZtAIcaugECAAAAAAAAAA==/",
"_etag": "\"0400d17e-0000-0000-0000-590a493a0000\"",
"_attachments": "attachments/",
"cast": [
"315986",
"321880",
"603325",
"484671",
"603324",
"734554",
"734555",
"706818",
"711766",
"734556",
"734455"
],
"crew": [
"58185",
"390726",
"302640",
"670953",
"28046",
"122587"
],
"_ts": 1493846327
},

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Range Index pre-existing collection programmatically - azure

Related

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

Graph DB Gremlin query for nested properties

CouchDB Mango query - Match any key with array item

Filter doc in DynamoDb by nested object list item using node.js

Azure search index not updating field

Categories

Resources