Cosmos DB - Deleting a document - azure

How can I delete an individual record from Cosmos DB?
I can select using SQL syntax:
SELECT *
FROM collection1
WHERE (collection1._ts > 0)
And sure enough all documents (analogous to rows?) are returned
However this doesn't work when I attempt to delete
DELETE
FROM collection1
WHERE (collection1._ts > 0)
How do I achieve that?

The DocumentDB API's SQL is specifically for querying. That is, it only provides SELECT, not UPDATE or DELETE.
Those operations are fully supported, but require REST (or SDK) calls. For example, with .net, you'd call DeleteDocumentAsync() or ReplaceDocumentAsync(), and in node.js, this would be a call to deleteDocument() or replaceDocument().
In your particular scenario, you could run your SELECT to identify documents for deletion, then make "delete" calls, one per document (or, for efficiency and transactionality, pass an array of documents to delete, into a stored procedure).

The easiest way is probably by using Azure Storage Explorer. After connecting you can drill down to a container of choice, select a document and then delete it. You can find additional tools for Cosmos DB on https://gotcosmos.com/tools.

Another option to consider is the time to live (TTL). You can turn this on for a collection and then set an expiration for the documents. The documents will be cleaned up automatically for you as they expire.

Create a stored procedure with the following code:
/**
* A Cosmos DB stored procedure that bulk deletes documents for a given query.
* Note: You may need to execute this stored procedure multiple times (depending whether the stored procedure is able to delete every document within the execution timeout limit).
*
* #function
* #param {string} query - A query that provides the documents to be deleted (e.g. "SELECT c._self FROM c WHERE c.founded_year = 2008"). Note: For best performance, reduce the # of properties returned per document in the query to only what's required (e.g. prefer SELECT c._self over SELECT * )
* #returns {Object.<number, boolean>} Returns an object with the two properties:
* deleted - contains a count of documents deleted
* continuation - a boolean whether you should execute the stored procedure again (true if there are more documents to delete; false otherwise).
*/
function bulkDeleteStoredProcedure(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
And execute it using your partition key (example: null) and a query to select the documents (example: SELECT c._self FROM c to delete all).
Based on Delete Documents from CosmosDB based on condition through Query Explorer

Here is an example of how to use bulkDeleteStoredProcedure using .net Cosmos SDK V3.
ContinuationFlag has to be used because of the execution bounds.
private async Task<int> ExecuteSpBulkDelete(string query, string partitionKey)
{
var continuationFlag = true;
var totalDeleted = 0;
while (continuationFlag)
{
StoredProcedureExecuteResponse<BulkDeleteResponse> result = await _container.Scripts.ExecuteStoredProcedureAsync<BulkDeleteResponse>(
"spBulkDelete", // your sproc name
new PartitionKey(partitionKey), // pk value
new[] { sql });
var response = result.Resource;
continuationFlag = response.Continuation;
var deleted = response.Deleted;
totalDeleted += deleted;
Console.WriteLine($"Deleted {deleted} documents ({totalDeleted} total, more: {continuationFlag}, used {result.RequestCharge}RUs)");
}
return totalDeleted;
}
and response model:
public class BulkDeleteResponse
{
[JsonProperty("deleted")]
public int Deleted { get; set; }
[JsonProperty("continuation")]
public bool Continuation { get; set; }
}

Related

Cosmos DB Pagination giving multiplied page records

I have a scenario where I need to filter the collections based on the elements present in array inside documents. Can Anyone suggest how to use OFFSET and LIMIT with nested array in document
{
"id": "abcd",
"pqrs": 1,
"xyz": "UNKNOWN_594",
"arrayList": [
{
"Id": 2,
"def": true
},
{
"Id": 302,
"def": true
}
]
}
Now I need to filter and take 10 10 records from collections. I tried following query
SELECT * FROM collections c
WHERE ARRAY_CONTAINS(c.arrayList , {"Id":302 },true) or ARRAY_CONTAINS(c.arrayList , {"Id":2 },true)
ORDER BY c._ts DESC
OFFSET 10 LIMIT 10
Now when I run this query it is returning me 40 Records
At every step in next OFFSET, RU will go on increasing, instead you can use ContinuationToken
private static async Task QueryWithPagingAsync(Uri collectionUri)
{
// The .NET client automatically iterates through all the pages of query results
// Developers can explicitly control paging by creating an IDocumentQueryable
// using the IQueryable object, then by reading the ResponseContinuationToken values
// and passing them back as RequestContinuationToken in FeedOptions.
List<Family> families = new List<Family>();
// tell server we only want 1 record
FeedOptions options = new FeedOptions { MaxItemCount = 1, EnableCrossPartitionQuery = true };
// using AsDocumentQuery you get access to whether or not the query HasMoreResults
// If it does, just call ExecuteNextAsync until there are no more results
// No need to supply a continuation token here as the server keeps track of progress
var query = client.CreateDocumentQuery<Family>(collectionUri, options).AsDocumentQuery();
while (query.HasMoreResults)
{
foreach (Family family in await query.ExecuteNextAsync())
{
families.Add(family);
}
}
// The above sample works fine whilst in a loop as above, but
// what if you load a page of 1 record and then in a different
// Session at a later stage want to continue from where you were?
// well, now you need to capture the continuation token
// and use it on subsequent queries
query = client.CreateDocumentQuery<Family>(
collectionUri,
new FeedOptions { MaxItemCount = 1, EnableCrossPartitionQuery = true }).AsDocumentQuery();
var feedResponse = await query.ExecuteNextAsync<Family>();
string continuation = feedResponse.ResponseContinuation;
foreach (var f in feedResponse.AsEnumerable().OrderBy(f => f.Id))
{
}
// Now the second time around use the contiuation token you got
// and start the process from that point
query = client.CreateDocumentQuery<Family>(
collectionUri,
new FeedOptions
{
MaxItemCount = 1,
RequestContinuation = continuation,
EnableCrossPartitionQuery = true
}).AsDocumentQuery();
feedResponse = await query.ExecuteNextAsync<Family>();
foreach (var f in feedResponse.AsEnumerable().OrderBy(f => f.Id))
{
}
}
To skip through specific page, pfb the code
private static async Task QueryPageByPage(int currentPageNumber = 1, int documentNumber = 1)
{
// Number of documents per page
const int PAGE_SIZE = 3 // configurable;
// Continuation token for subsequent queries (NULL for the very first request/page)
string continuationToken = null;
do
{
Console.WriteLine($"----- PAGE {currentPageNumber} -----");
// Loads ALL documents for the current page
KeyValuePair<string, IEnumerable<Family>> currentPage = await QueryDocumentsByPage(currentPageNumber, PAGE_SIZE, continuationToken);
foreach (Family celeryTask in currentPage.Value)
{
documentNumber++;
}
// Ensure the continuation token is kept for the next page query execution
continuationToken = currentPage.Key;
currentPageNumber++;
} while (continuationToken != null);
Console.WriteLine("\n--- END: Finished Querying ALL Dcuments ---");
}
and QueryDocumentsByPage function as follows
private static async Task<KeyValuePair<string, IEnumerable<Family>>> QueryDocumentsByPage(int pageNumber, int pageSize, string continuationToken)
{
DocumentClient documentClient = new DocumentClient(new Uri("https://{CosmosDB/SQL Account Name}.documents.azure.com:443/"), "{CosmosDB/SQL Account Key}");
var feedOptions = new FeedOptions {
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
// IMPORTANT: Set the continuation token (NULL for the first ever request/page)
RequestContinuation = continuationToken
};
IQueryable<Family> filter = documentClient.CreateDocumentQuery<Family>("dbs/{Database Name}/colls/{Collection Name}", feedOptions);
IDocumentQuery<Family> query = filter.AsDocumentQuery();
FeedResponse<Family> feedRespose = await query.ExecuteNextAsync<Family>();
List<Family> documents = new List<Family>();
foreach (CeleryTask t in feedRespose)
{
documents.Add(t);
}
// IMPORTANT: Ensure the continuation token is kept for the next requests
return new KeyValuePair<string, IEnumerable<Family>>(feedRespose.ResponseContinuation, documents);
}
Are you actually receiving 40 elements in the results? Or is it that you are getting back 10 documents but maybe your Cosmos itself has 40 documents for this query?
Using ORDER by clause retrieves all the documents based on the query, orders it in the DB and then applies the OFFSET and LIMIT values to deliver the final results.
I've illustrated this from the below snapshot.
My Cosmos account has 14 documents which match the query, this is
what matches the retrieved document count.
The output document is 10 because the DB had to skip the first 5 and
then deliver the next 5.
But my actual results are only 5 documents because that is what I
asked for.
Continuation tokens are efficient for paging but have limitations. They cannot be used if you directly want to skip pages(say jump from page 1 to page 10). You need to traverse through the pages from the first document and keep using the token to go to the next page. Due to the limitations, it is usually recommended if you have a large number of documents for a single query.
Another recommendation is to use indexing to improve your RU/s usage when using ORDER BY. See this link.

Delete documents from Azure cosmos DB collection with multiple Partition Keys

I have to delete some documents from azure cosmos DB through azure portal.
I wrote a stored procedure in container which will delete the data which has to be deleted.
But at the time of execution of stored procedure it will ask for partition key value. I have to delete documents which are having different partition keys.
Below given is the Stored Procedure Used.
function bulkDeleteProcedure(deletedate) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var query = "SELECT * FROM c where c._ts = " + deletedate;
var responseBody = {
docs_deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
fetchData();
function fetchData(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, items, responseOptions) {
if (err) throw err;
// response.setBody(items);
if (items.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(items);
// response.setBody(items.length + " : inside retrvd docs");
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
fetchData(responseOptions.continuation);
} else {
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
fetchData();
}
}
}
Suppose my partition key is vehicle type and I am having vehicle type values as 01 to 10. The sql query as per my requirement will return documents with 10 different partition key values.
Current scenario is like i have to run the stored procedure 10 times by providing each partition key value each time.
Is it possible to run this stored procedure for different partition key values in a single go?
Current scenario is like i have to run the stored procedure 10 times
by providing each partition key value each time. Is it possible to run
this stored procedure for different partition key values in a single
go?
Unfortunately no. You can't provide multiple partition keys to a stored procedure. You will need to execute the stored procedure 10 times.
One thing you could do is use any available SDK and write code to execute stored procedure in a loop. You could create an array of partition keys and loop through that and execute stored procedure for each partition key in that array.

SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t not working

In a Cosmos DB stored procedure, I'm using a inline sql query to try and retrieve the distinct count of a particular user id.
I'm using the SQL API for my account. I've run the below query in Query Explorer in my Cosmos DB account and I know that I should get a count of 10 (There are 10 unique user ids in my collection):
SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t
However when I run this in the Stored Procedure portal, I either get 0 records back or 18 records back (total number of documents). The code for my Stored Procedure is as follows:
function GetDistinctCount() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t',
function(err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
var body = {code: 404, body: "no docs found"}
response.setBody(JSON.stringify(body));
} else {
var response = getContext().getResponse();
var body = {code: 200, body: feed[0]}
response.setBody(JSON.stringify(body));
}
}
)
}
After looking at various feedback forums and documentation, I don't think there's an elegant solution for me to do this as simply as it would be in normal SQL.
the UserId is my partition key which I'm passing through in my C# code and when I test it in the portal, so there's no additional parameters that I need to set when calling the Stored Proc. I'm calling this Stored Proc via C# and adding any further parameters will have an effect on my tests for that code, so I'm keen not to introduce any parameters if I can.
Your problem is caused by that you missed setting partition key for your stored procedure.
Please see the statements in the official document:
And this:
So,when you execute a stored procedure under a partitioned collection, you need to pass the partition key param. It's necessary! (Also this case explained this:Documentdb stored proc cross partition query)
Back to your question,you never pass any partition key, equals you pass an null value or "" value for partition key, so it outputs no data because you don't have any userId equals null or "".
My advice:
You could use normal Query SDK to execute your sql, and set the enableCrossPartitionQuery: true which allows you scan entire collection without setting partition key. Please refer to this tiny sample:Can't get simple CosmosDB query to work via Node.js - but works fine via Azure's Query Explorer
So I found a solution that returns the result I need. My stored procedure now looks like this:
function GetPaymentCount() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT DISTINCT VALUE(doc.UserId) from root doc' ,
{pageSize:-1 },
function(err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
var body = {code: 404, body: "no docs found"}
response.setBody(JSON.stringify(body));
} else {
var response = getContext().getResponse();
var body = {code: 200, body: JSON.stringify(feed.length)}
response.setBody(JSON.stringify(body));
}
}
)
}
Essentially, I changed the pageSize parameter to -1 which returned all the documents I knew would be returned in the result. I have a feeling that this will be more expensive in terms of RU/s cost, but it solves my case for now.
If anyone has more efficient alternatives, please comment and let me know.

Azure CosmosDB: stored procedure delete documents based on query

The goal is to input a simple string query like
SELECT *
FROM c
WHERE c.deviceId = "device1"
and all resulting fetched documents need to be deleted.
I have found very old posts about doing this with a stored procedure, but I can't get it to work properly with the "new" UI.
Thanks a lot in advance.
EDIT: I feel like #jay-gong pointed to the correct direction but I encountered a problem with his solution:
I can correctly create the stored procedure but when I try to execute it it asks for the partition key, which I give but after executing, it doesn't delete any document.
The collection just has a few documents and its partition key is /message/id which is what I wrote in the partition key field.
Since cosmos db does not support deleting documents by SQL (Delete SQL for CosmosDB), you could query the documents and delete them by Delete SDK one by one. Or you could choose bulk operation in stored procedure.
You could totally follow the stored procedure bulk delete sample code to implement your requirements which works for me.
function bulkDeleteProcedure(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
query = 'SELECT * FROM c WHERE c.deviceId="device1"';
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
Furthermore, as I know, stored procedure has 5 seconds execute limitation. If you crash into the time out error, you could pass the continuation token as parameter into stored procedure and execute stored procedure several times.
Update Answer:
Partition key is necessary for the partitioned collection in the stored procedure.(Please refer to the detailed explanation :Azure Cosmos DB asking for partition key for stored procedure.)
So, firstly,above code needs your partition key.For example, your partition key is defined as /message/id and your data as below:
{
"message":{
"id":"1"
}
}
Then you need to pass the pk as message/1.
Obviously,your query sql crosses partitions,I suggest you adopt http trigger azure function instead of stored procedure.In that function,you could use cosmos db sdk code to do the query and delete operations.Don't forget set the EnableCrossPartitionQuery to true. Please refer to this case:Azure Cosmos DB asking for partition key for stored procedure.

Stored Procedure to update or insert docs that belong to multiple partition keys

I have a list of documents that belong to a partitioned collection. Instead of querying for every document from the .NET client and either do update or insert, I thought I could use a Stored Procedure to accomplish this.
What I did not initially realize is that Stored Procedures are executed in the transaction scope of a single partition key. So I am getting PartitionKey value must be supplied for this operation.
The thing is that the documents (that I am trying to upsert) may belong to different partitions. How can I accomplish this in the Stored Procedure? In my case, the SP is useless unless it can operate on multiple partitions.
This is how I constructed my SP:
function upsertEcertAssignments(ecerts) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
// Validate input
if (!ecerts) throw new Error("The ecerts is null or undefined");
if (ecerts.length == 0) throw new Error("The ecerts list size is 0");
// Recursively call the 'process' function
processEcerts(ecerts, 0);
function processEcerts(ecerts, index) {
if (index >= ecerts.length) {
response.setBody(index);
return;
}
var query = {query: "SELECT * FROM DigitalEcerts c WHERE c.code = #code AND c.collectionType = #type", parameters: [{name: "#code", value: ecerts[index].code}, {name: "#type", value: 0}]};
var isQueryAccepted = collection.queryDocuments(collectionLink, query, {partitionKey: ecerts[index].code}, function(err, foundDocuments, foundOptions) {
if (err) throw err;
if (foundDocuments.length > 0) {
var existingEcert = foundDocuments[0];
ecerts[index].id = existingEcert.id;
var isAccepted = __.replaceDocument(existingEcert._self, ecerts[index], function(err, updatedEcert, replacedOptions) {
if (err) throw err;
processEcerts(ecerts, index + 1);
});
if (!isAccepted) {
response.setBody(index);
}
} else {
var isAccepted = __.createDocument(__.getSelfLink(), ecerts[index], function(err, insertedEcert, insertedOptions) {
if (err) throw err;
processEcerts(ecerts, index + 1);
});
if (!isAccepted) {
response.setBody(index);
}
}
});
if (!isQueryAccepted)
response.setBody(index);
}
}
From .NET, if I call it like this, I get the partitionKey value problem:
var continuationIndex = await _docDbClient.ExecuteStoredProcedureAsync<int>(UriFactory.CreateStoredProcedureUri(_docDbDatabaseName, _docDbDigitalEcertsCollectionName, "UpsertDigitalMembershipEcertAssignments"), digitalEcerts);
If I call it with a partition key, it works...but it is useless:
var continuationIndex = await _docDbClient.ExecuteStoredProcedureAsync<int>(UriFactory.CreateStoredProcedureUri(_docDbDatabaseName, _docDbDigitalEcertsCollectionName, "UpsertDigitalMembershipEcertAssignments"), new RequestOptions { PartitionKey = new PartitionKey(digitalEcerts[0].Code) }, digitalEcerts.Take(1).ToList());
I appreciate any pointer.
Thanks.
By the sound of it, your unique id is a combination of code and type. I would recommend making your id property to be the combination of two.
This guarantees that your id is unique but also eliminates the need to query for it.
If the collection the stored procedure is registered against is a
single-partition collection, then the transaction is scoped to all the
documents within the collection. If the collection is partitioned,
then stored procedures are executed in the transaction scope of a
single partition key. Each stored procedure execution must then
include a partition key value corresponding to the scope the
transaction must run under.
You could refer to the description above which mentioned here. We can query documents cross partitions via setting EnableCrossPartitionQuery to true in FeedOptions parameter. However, the RequestOptions doesn't have such properties against executing stored procedure.
So, It seems you have to provide partition key when you execute sp. Of course, it can be replaced by upsert function. It is useless from the perspective of the business logic, but if bulk operations, the SP can release some of the performance pressure because the SP is running on the server side.
Hope it helps you.

Resources