Strange query results in Azure Cosmos DB - azure

I have following documents in my Azure Cosmos DB:
{
"id": "token",
"User": {
"UserToken": "token",
"Email": "test#email.com"
},
"_ts": 1541493290
}
When I run the following query:
SELECT * FROM root
WHERE ((root["User"]["UserToken"] = "token")
OR CONTAINS(root["User"]["Email"], "token"))
ORDER BY root["_ts"] DESC
Nothing is returned. But when I change it a bit. For example byconverting Email to email:
SELECT * FROM root
WHERE ((root["User"]["UserToken"] = "token")
OR CONTAINS(root["User"]["email"], "token"))
ORDER BY root["_ts"] DESC
The result is found. Moreover when I remove ORDER BY clause, also query returns me a result. So the query is like following
SELECT * FROM root
WHERE ((root["User"]["UserToken"] = "token")
OR CONTAINS(root["User"]["Email"], "token"))
Moreover, when I edit the document (like open it, add an empty line and save), some magic happens in the background and the document is found. For quite "new" documents (less than 1-3 months), I can search them without my "magic" trick.
Indexes definition is:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
],
"excludedPaths": []
}
What I did wrong?
UPDATE the answer is not a full explanation but it helps a lot. Full explanation is in my blog (https://stapp.space/ridiculous-bug-in-azure-cosmos-db/)

CONTAINS(root["User"]["Email"], "token") won't work if you have strings indexed as Hash. They need to be Range with -1 precision. Hash only works for equality checks.
That's why the lowercase one is working. Because it cannot find the property and it just ignores it, falling back to the equality check. The first one finds it, sees that it's not indexed as Range and it just fails to return.
Changing indexing to this, will work:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
],
"excludedPaths": []
}
On a side note, the _ts field is not the best way to do ordering based on creation. It is a unix timestamp in seconds, so any documents created in the same second won't be properly ordered.

Related

Can I index an array in a composite index in Azure Cosmos DB?

I have a problem indexing an array in Azure Cosmos DB
I am trying to save this indexing policy via the portal
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/DeviceId",
"order": "ascending"
},
{
"path": "/TimeStamp",
"order": "ascending"
},
{
"path": "/Items/[]/Name/?",
"order": "ascending"
},
{
"path": "/Items/[]/DoubleValue/?",
"order": "ascending"
}
]
]
}
I get the error "Failed to update container DeviceEvents:
Message: {"code":"BadRequest","message":"Message: {"Errors":["The indexing path '\/Items\/[]\/Name\/?' could not be accepted, failed near position '8'."
This seems to be the array [] syntax that is giving an error.
On a side note I am not sure what I am doing makes sense at all but I have a query that looks like this
SELECT SUM(de0["DoubleValue"])
FROM root JOIN de0 IN root["Items"]
WHERE root["ApplicationId"] = 57 AND root["DeviceId"] = 126 AND root["TimeStamp"] >= "2021-02-21T17:55:29.7389397Z" AND de0["Name"] = "Use Case"
Where ApplicationId is the partition key and the item saved looks like this
{
"id": "59ab9323-26ca-436f-8d29-e1ddd826f025",
"DeviceId": 3,
"ApplicationId": 3,
"RawData": "640F7A000A00E30142000000",
"TimeStamp": "2021-02-20T18:36:52.833174Z",
"Items": [
{
"Name": "Battery Status",
"StringValue": "Full",
"DoubleValue": null
},
{
"Name": "Use Case",
"StringValue": null,
"DoubleValue": 12
},
{
"Name": "Battery Voltage",
"StringValue": null,
"DoubleValue": 3.962
},
{
"Name": "Rain Gauge Count",
"StringValue": null,
"DoubleValue": 10
}
],
"_rid": "CgdVAO7B0DNkAAAAAAAAAA==",
"_self": "dbs/CgdVAA==/colls/CgdVAO7B0DM=/docs/CgdVAO7B0DNkAAAAAAAAAA==/",
"_etag": "\"61008771-0000-0d00-0000-603156c50000\"",
"_attachments": "attachments/",
"_ts": 1613846213
}
I need to aggregate on some of these items in the array like say get MAX on temperature or something like this (using Use Case for test although it doesn't make sense). I reasoned that if all the data in the query is in a single composite index the database would be able to do the aggregation without reading the documents themselves. However I can't seem to add a composite index containing an array at all.
Yes, composite index can't contain an array path. It should be a scalar value.
Unlike with included or excluded paths, you can't create a path with
the /* wildcard. Every composite path has an implicit /? at the end of
the path that you don't need to specify. Composite paths lead to a
scalar value and this is the only value that is included in the
composite index.
Reference:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#composite-indexes

CosmosDB not using indices under certain circumstances

I noticed an odd behaviour of CosmosDB regarding the use of indices.
A few words to my setup:
It is a partitioned CosmosDB with 25 partitions.
There are two fields of arrays containing strings which are named a and f. They have the following indexing policy:
{
"path": "/a/[]/?",
"indexes": [
{
"kind": "Hash",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
},
{
"path": "/f/[]/?",
"indexes": [
{
"kind": "Hash",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
There might be the case that a string that is in field a for one document occurs in f in another document.
The odd behaviour occurs when I execute the following query:
SELECT *
FROM Documents d
WHERE ARRAY_CONTAINS(d.a, 'some-string')
If 'some-string' doesn't occur in any others document's f field, all paritions have an IndexHitRation of 1 (as seen in QueryMetrics included in response). This is the behaviour I expect.
But if 'some-string' does occur in any others document's f field, the partitions containing such a document report an IndexHitRatio of 0 which has a great impact on the used RUs.
Can there be any mistakes in my setup that lead to this behaviour?
Can any one else reproduce this behaviour, so this is a bug?
To get rid of this behaviour I used different precision values for each field. So, now field a has precision -1 and field f has precision 7.
My conclusion from this would be that they were written to the same index when using the same precision. But this would be some unexpected behaviour of a database?!

Azure CosmosDb: Order-by item requires a range index

I'm performing a simple query via the Azure Portal "Query Explorer".
Here is my query:
SELECT * FROM c
WHERE c.DataType = 'Fruit'
AND c.ExperimentIdentifier = 'prod'
AND c.Param = 'banana'
AND Contains(c.SampleDateTime, '20171029')
ORDER BY c.SampleDateTime DESC
However, I get the exception:
Order-by item requires a range index to be defined on the corresponding index path.
There is no link to help regarding the error and I cannot make heads from tails, from that error message.
What does it mean, why is my query failing and how can I fix it?
P.S. the _ts property is no good to me as I do not want to order by the time the records were inserted.
ORDER BY is served directly from the index and thus it requires the order by item to be Range indexed (as opposed to Hash indexed).
While you could only index the order-by item as range (for both numbers and string), my advice is to index all paths as range with precision of -1.
Basically, you'd need to update the indexing policy of your collection to be something like this:
{
"automatic": true,
"indexingMode": "consistent",
"includedPaths": [
{
"path": "/",
"indexes": [
{ "kind": "Range", "dataType": "Number", "precision": -1 },
{ "kind": "Range", "dataType": "String", "precision": -1 }
]
}
]
}

Date Between Query in Cosmos DB

I am in the building a simple event store in Cosmos DB that has documents that are structured something like this:
{
"id": "e4c2bbd0-2885-4fb5-bcca-90436f79f155",
"entityType": "contact",
"history": [
{
"startDate": 1504656000,
"endDate": 1504656000,
"Name": "John"
},
{
"startDate": 1504828800,
"endDate": 1504828800,
"Name": "Jon"
}
]
}
This might not bet the most efficient way to store it but this is what I am starting with. But I want to be able to query all contact documents out of the db for a certain period of time. The startDate and endDate represent the time the record was valid. The history currently contains the entire history of the record which probably could be improved.
I have tried creating a query like this:
SELECT c.entityType, c.id,history.Name, history.startDate FROM c
JOIN history in c.history
where
c.entityType = "contact" AND
(history.StartDate <= 1504656001
AND history.EndDate >= 1504656001)
This query should return the state of the contact for 9/7/2017 but instead it is returning every one of the history. I have played with several options but I am not sure what I am missing.
I have also tried setting the index (maybe that is the issue?) So I have included the indexing policy here:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
What am I missing? Is the index correct? Is my query correct for a date between query?
You have two issues. One is addressed by Matias in comment.
Second, your condition is history.StartDate <= 1504656001 AND history.EndDate >= 1504656001.
play with the range for e.g. history.StartDate >= 1504656001 AND history.EndDate <= 1504656111.

Q: Azure Cosmos DB Graph: How to run queries in Graph API when Indexing Policy is defined as Manual?

In Cosmos DB graph when I am defining Indexing policy as Automatic, I am able to run queries but when I am updating indexing policy to Manual and defining Indexing path (/label/?) and Indexing mode set as 'Consistent', the query is not fetching any data.
Let's say my first query (when Indexing policy set as Manual) is :
g.addV('Azure').property('name','Cerulean Software'))
Result is :
[
{
"id": "0c14a00a-edf6-46b1-9e40-45cc37f750ea",
"label": "Azure",
"type": "vertex",
"properties": {
"name": [
{
"id": "f89ee2ee-74df-4256-a5d4-2b47eb526976",
"value": "Cerulean Software"
}
]
}
}
]
Now, my second query (when Indexing policy set as Manual (see Edit #1 below)) is:
g.V().hasLabel('Azure')
This second query is not fetching any result even though there is vertex present in graph named as 'Azure'.
What could be the possible reason behind this?
Edit #1: Manual Indexing Policy Before Change
"indexingPolicy": {
"automatic": false,
"excludedPaths": [],
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"dataType": "Number",
"kind": "Range",
"precision": -1
},
{
"dataType": "String",
"kind": "Hash",
"precision": 3
}
]
},
{
"path": "/label/?",
"indexes": [
{
"dataType": "String",
"kind": "Hash",
"precision": 3
},
{
"dataType": "Number",
"kind": "Range",
"precision": -1
}
]
}
],
"indexingMode": "consistent"
},
Edit #2: Manual Indexing Policy After Change
"indexingPolicy": {
"automatic": false,
"excludedPaths": [],
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"dataType": "Number",
"kind": "Range",
"precision": -1
},
{
"dataType": "String",
"kind": "Hash",
"precision": 3
}
]
},
{
"path": "/_isEdge/?",
"indexes": [
{
"dataType": "String",
"kind": "Hash",
"precision": 3
},
{
"dataType": "Number",
"kind": "Range",
"precision": -1
}
]
}
],
"indexingMode": "consistent"
},
With Cosmos, graph statements are not executed as traversals on the Azure side. The graph client actually translates gremlin statements into Document SQL calls and then aggregates the results back to you on the client side. In the case of your statement g.V().hasLabel('Azure') the call is actually translated to {"query":"SELECT N_2 FROM Node N_2 WHERE (IS_DEFINED(N_2._isEdge) = false AND (N_2.label = 'Azure'))"}
This can be verified through the use of a proxy such as Fiddler which will allow you to inspect the outbound calls from your machine.
The top level _isEdge property seems to be used across almost all Gremlin translated queries so I suspect that if you add that property to your indexing policy you should start to see the expected results.
EDIT:
I originally missed the part of your indexing policy that sets automatic: false. According to the Cosmos docs (under the heading Opting in and opting out of indexing), By default, all documents are automatically indexed, but you can choose to turn it off. When indexing is turned off, documents can be accessed only through their self-links or by queries using ID.
If you choose to run with indexing turned off, then the rest of your indexing policy is effectively meaningless and queries that aren't directly by document Id will no longer work. Can you elaborate as to what you're actually trying to accomplish here? There seems to be a bit of confusion. The indexing settings you've placed on label and isEdge aren't even necessary because they are the same as the value you've put for * which is the default rule matching all paths.
Post what your goals are for your indexing strategy and I can try to make an appropriate recommendation but you're definitely going to want to put automatic: true back into your policy.

Resources