Azure Change Feed and Querying based on Partition - azure

When we fetch data from Document Db in the change feed, we only want it per partition and have tried adding PatitionKey to the code.
do
{
FeedResponse<PartitionKeyRange> pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(
collectionUri,
new FeedOptions
{
RequestContinuation = pkRangesResponseContinuation,
PartitionKey = new PartitionKey("KEY"),
});
partitionKeyRanges.AddRange(pkRangesResponse);
pkRangesResponseContinuation = pkRangesResponse.ResponseContinuation;
}
while (pkRangesResponseContinuation != null);
It returns single range and when we go perform the second query
IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions
{
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = -1,
});
It returns all the results from all partitions. Is there a way to restrict the results from single partition only?

Changefeed works at a PartitionKey Range level.
What are partition key ranges?
Document Db currently has 10 GB Physical partitions.
The partition key that you specify is the Logical Partition Key.
Document Db internally maps this logical partition key to a Physical Partition using a hash.
So its possible that a bunch of logical partitions are sharing the same physical partition.
So a physical partition is assigned for a range of these hashes.
The minimum grain that is allowed to read from changefeed would be Partition key ranges.
So for the you would have to query the partition key range id for the partition that you are interested in. Then query the Changefeed for that range id and filter out the data that is not associated to the partition id.
Note: Document db transparently creates new physical partitions if a particular partition gets full. So the partition key range id for a given logical partition could change over time.
This link explains this in good detail:
https://learn.microsoft.com/en-us/azure/cosmos-db/partition-data#partitioning-in-azure-cosmos-db

Related

Why do Azure Cosmos queries have higher RUs when specifying the partition key?

I have a question similar to this one. Basically, I have been testing different ways to use partition key, and have noticed that at any time, the more a partition key is referenced in a query, the higher the RUs. It is quite consistent, and doesn't even matter how the partition key is used. So I narrowed it down to the basic queries for test.
To start, this database has about 850K documents, all more than 1KB in size. The partition key is basically a 100 modulus of the id in number form, is set to /partitionKey, and the container uses a default indexing policy:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
]
}
Here is my basic query test:
SELECT c.id, c.partitionKey
FROM c
WHERE c.partitionKey = 99 AND c.id = '99999'
-- Yields One Document; Actual Request Charge: 2.95 RUs
SELECT c.id, c.partitionKey
FROM c
WHERE c.id = '99999'
-- Yields One Document; Actual Request Charge: 2.85 RUs
Azure Cosmos documentation says without the partition key, the query will "fan out" to all logical partitions. Therefore I would fully expect the first query to target a single partition and the second to target all of them, meaning the first one should have a lower RUs. I suppose I am using RU results as evidence to whether or not the Cosmos is fanning out and scanning each partition, and comparing it to what the documentation says should happen.
I know these results are just 0.1 RUs in difference. But my point is the more complex the query, the bigger the difference. For example, here is another query ever so slightly more complex:
SELECT c.id, c.partitionKey
FROM c
WHERE (c.partitionKey = 98 OR c.partitionKey = 99) AND c.id = '99999'
-- Yields One Document; Actual Request Charge: 3.05 RUs
Notice the RUs continues to grow and separate from having not specified a partition key at all. Instead I would expect the above query to only target two partitions, compared to no partition key check which supposedly fans out to all partitions.
I am starting to suspect the partition key check is happening after the other filters (or inside each partition scan). For example, going back to the first query but changing the id to something which does not exist:
SELECT c.id, c.partitionKey
FROM c
WHERE c.partitionKey = 99 AND c.id = '99999x'
-- Yields Zero Documents; Actual Request Charge: 2.79 RUs
SELECT c.id, c.partitionKey
FROM c
WHERE c.id = '99999x'
-- Yields Zero Documents; Actual Request Charge: 2.79 RUs
Notice the RUs are exactly the same, and both (including the one with the partition filter) have less RUs than when a document exists. This seems like it would be a symptom of the partition filter being executed on the results, not restricting a fan-out. But this is not what the documentation says.
Why does Cosmos have higher RUs when a partition key is specified?
like the comment specifies if you are testing via the portal (or via the code, but with the query you provided) it will become more expensive, because you are not querying a specific partition, but rather querying everything and then introducing another filter, which is more expense.
what you should do instead - is use the proper way in the code to pass in the partition key. my result were quite impressive: 3 ru\s with the PK and 20.000 ru\s without the PK, so I'm quite confident intworks (I've had a really large dataset)

Batch conditional delete from dynamodb without sort key

I am shifting my database from mongodb to dynamo db. I have a problem with delete function from a table where labName is partition key and serialNumber is my sort key and there is one Id as feedId I want to delete all the records from the table where labName is given and feedId is NOT IN (array of ids).
I am doing it in mongo like below mentioned code
Is there a way with BatchWriteItem where i can add condition for feedId without sort key.
let dbHandle = await getMongoDbHandle(dbName);
let query = {
feedid: {$nin: feedObjectIds}
}
let output = await dbModule.removePromisify(dbHandle,
dbModule.collectionNames.feeds, query);
While working with DynamoDB, you can perform Conditional Retrieval (GET) / Deletion (DELETE) on the records only & only if you have provided all of the attributes for the Primary Key. For example:
For a Simple Primary key, you only need to provide a value for the Partition key.
For a Composite Primary Key, you must need to provide values for both the Partition key & sort key.

How to get document count within a partition using sql query in Azure CosmosDB

We have Azure CosmosDb database where collection is partitioned based on say "/deviceid". We want to get count of all the documents within a specific partition. We ran this query -
FeedOptions options = new FeedOptions()
{
PartitionKey = new PartitionKey("f0e14e52ed2c499e893ac934ae934835"),
};
IDocumentQuery<dynamic> query = client.CreateDocumentQuery(collectionUri, "Select Value Count(1) From c", options).AsDocumentQuery();
FeedResponse<dynamic> data = await query.ExecuteNextAsync();
This query works but at the cost of high Rus. for 153068 documents we incurred ~10K Rus. Indexing mode is "consistent" and automatic is set to true.
Looking for suggestions on how can we get the count of documents without incurring so many Rus.

cosmos db document query takes long time

i am new to cosmos-db and facing issues in querying the collection, i have a partitioned collection with 100000 RU/s(unlimited storage capacity). the partition is based on '/Bid' which a GUID. i am querying the collection based on the partition key value which has 10,000 records (the collection has more than 28,942,445 documents for different partitions). i am using the following query to get the documents but it takes around 50 seconds to execute the query which is not feasible.
object partitionkey = new object();
partitionkey = "2359c59a-f730-40df-865c-d4e161189f5b";
// Now execute the same query via direct SQL
var DistinctBColumn = this.client.CreateDocumentQuery<BColumn>(BordereauxColumnCollection.SelfLink, "SELECT * FROM BColumn_UL c WHERE c.BId = '2359c59a-f730-40df-865c-d4e161189f5b'",new FeedOptions { EnableCrossPartitionQuery=true, PartitionKey= new PartitionKey("2359c59a-f730-40df-865c-d4e161189f5b") }, partitionkey).ToList();
also tried with other querying options which too resulted in talking along 50 seconds.
But it takes less than a second for the same query on azure portal.
kindly help to optimize the query and correct me if i am wrong. Many Thanks.

DocumentDB Update Partition Key in Trigger

I have a partition key setup in a MM/YYYY format based on current timestamp for records. I also have a PreTrigger to update this value when a record is saved:
function validate() {
var context = getContext();
var request = context.getRequest();
var document = request.getBody();
var now = new Date(),
document.PartitionKey = ("0" + (now.getMonth() + 1)).slice(-2) + "/" + now.getFullYear();
request.setBody(document);
}
However, I receive the following error:
One or more errors occurred.Message: {"Errors":["PartitionKey extracted from document doesn't match the one specified in the header"]}
Are we not able to modify the partition key in a trigger?
No, you cannot change the partition key from inside a trigger.
This is because stored procedures/triggers are executed transactionally within the scope of a single partition key. Since DocumentDB is a distributed database, the partition key is required to route the request to the right server/partition.
The best way to do this is from a data access layer that populates the partition key during insertion. On a side note, using timestamp is discouraged as a partition key because it can lead to hot spots (typically data is frequently accessed for the current timestamp/last few hours).

Resources