Azure CosmosDB: stored procedure delete documents based on query

Azure CosmosDB: stored procedure delete documents based on query - azure

The goal is to input a simple string query like
SELECT *
FROM c
WHERE c.deviceId = "device1"
and all resulting fetched documents need to be deleted.
I have found very old posts about doing this with a stored procedure, but I can't get it to work properly with the "new" UI.
Thanks a lot in advance.
EDIT: I feel like #jay-gong pointed to the correct direction but I encountered a problem with his solution:
I can correctly create the stored procedure but when I try to execute it it asks for the partition key, which I give but after executing, it doesn't delete any document.
The collection just has a few documents and its partition key is /message/id which is what I wrote in the partition key field.

Since cosmos db does not support deleting documents by SQL (Delete SQL for CosmosDB), you could query the documents and delete them by Delete SDK one by one. Or you could choose bulk operation in stored procedure.
You could totally follow the stored procedure bulk delete sample code to implement your requirements which works for me.
function bulkDeleteProcedure(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
query = 'SELECT * FROM c WHERE c.deviceId="device1"';
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
Furthermore, as I know, stored procedure has 5 seconds execute limitation. If you crash into the time out error, you could pass the continuation token as parameter into stored procedure and execute stored procedure several times.
Update Answer:
Partition key is necessary for the partitioned collection in the stored procedure.(Please refer to the detailed explanation :Azure Cosmos DB asking for partition key for stored procedure.)
So, firstly,above code needs your partition key.For example, your partition key is defined as /message/id and your data as below:
{
"message":{
"id":"1"
}
}
Then you need to pass the pk as message/1.
Obviously,your query sql crosses partitions,I suggest you adopt http trigger azure function instead of stored procedure.In that function,you could use cosmos db sdk code to do the query and delete operations.Don't forget set the EnableCrossPartitionQuery to true. Please refer to this case:Azure Cosmos DB asking for partition key for stored procedure.

Related

Delete documents from Azure cosmos DB collection with multiple Partition Keys

I have to delete some documents from azure cosmos DB through azure portal.
I wrote a stored procedure in container which will delete the data which has to be deleted.
But at the time of execution of stored procedure it will ask for partition key value. I have to delete documents which are having different partition keys.
Below given is the Stored Procedure Used.
function bulkDeleteProcedure(deletedate) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var query = "SELECT * FROM c where c._ts = " + deletedate;
var responseBody = {
docs_deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
fetchData();
function fetchData(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, items, responseOptions) {
if (err) throw err;
// response.setBody(items);
if (items.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(items);
// response.setBody(items.length + " : inside retrvd docs");
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
fetchData(responseOptions.continuation);
} else {
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
fetchData();
}
}
}
Suppose my partition key is vehicle type and I am having vehicle type values as 01 to 10. The sql query as per my requirement will return documents with 10 different partition key values.
Current scenario is like i have to run the stored procedure 10 times by providing each partition key value each time.
Is it possible to run this stored procedure for different partition key values in a single go?

Current scenario is like i have to run the stored procedure 10 times
by providing each partition key value each time. Is it possible to run
this stored procedure for different partition key values in a single
go?
Unfortunately no. You can't provide multiple partition keys to a stored procedure. You will need to execute the stored procedure 10 times.
One thing you could do is use any available SDK and write code to execute stored procedure in a loop. You could create an array of partition keys and loop through that and execute stored procedure for each partition key in that array.

SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t not working

In a Cosmos DB stored procedure, I'm using a inline sql query to try and retrieve the distinct count of a particular user id.
I'm using the SQL API for my account. I've run the below query in Query Explorer in my Cosmos DB account and I know that I should get a count of 10 (There are 10 unique user ids in my collection):
SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t
However when I run this in the Stored Procedure portal, I either get 0 records back or 18 records back (total number of documents). The code for my Stored Procedure is as follows:
function GetDistinctCount() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t',
function(err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
var body = {code: 404, body: "no docs found"}
response.setBody(JSON.stringify(body));
} else {
var response = getContext().getResponse();
var body = {code: 200, body: feed[0]}
response.setBody(JSON.stringify(body));
}
}
)
}
After looking at various feedback forums and documentation, I don't think there's an elegant solution for me to do this as simply as it would be in normal SQL.
the UserId is my partition key which I'm passing through in my C# code and when I test it in the portal, so there's no additional parameters that I need to set when calling the Stored Proc. I'm calling this Stored Proc via C# and adding any further parameters will have an effect on my tests for that code, so I'm keen not to introduce any parameters if I can.

Your problem is caused by that you missed setting partition key for your stored procedure.
Please see the statements in the official document:
And this:
So,when you execute a stored procedure under a partitioned collection, you need to pass the partition key param. It's necessary! (Also this case explained this:Documentdb stored proc cross partition query)
Back to your question,you never pass any partition key, equals you pass an null value or "" value for partition key, so it outputs no data because you don't have any userId equals null or "".
My advice:
You could use normal Query SDK to execute your sql, and set the enableCrossPartitionQuery: true which allows you scan entire collection without setting partition key. Please refer to this tiny sample:Can't get simple CosmosDB query to work via Node.js - but works fine via Azure's Query Explorer

So I found a solution that returns the result I need. My stored procedure now looks like this:
function GetPaymentCount() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT DISTINCT VALUE(doc.UserId) from root doc' ,
{pageSize:-1 },
function(err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
var body = {code: 404, body: "no docs found"}
response.setBody(JSON.stringify(body));
} else {
var response = getContext().getResponse();
var body = {code: 200, body: JSON.stringify(feed.length)}
response.setBody(JSON.stringify(body));
}
}
)
}
Essentially, I changed the pageSize parameter to -1 which returned all the documents I knew would be returned in the result. I have a feeling that this will be more expensive in terms of RU/s cost, but it solves my case for now.
If anyone has more efficient alternatives, please comment and let me know.

Stored Procedure to update or insert docs that belong to multiple partition keys

I have a list of documents that belong to a partitioned collection. Instead of querying for every document from the .NET client and either do update or insert, I thought I could use a Stored Procedure to accomplish this.
What I did not initially realize is that Stored Procedures are executed in the transaction scope of a single partition key. So I am getting PartitionKey value must be supplied for this operation.
The thing is that the documents (that I am trying to upsert) may belong to different partitions. How can I accomplish this in the Stored Procedure? In my case, the SP is useless unless it can operate on multiple partitions.
This is how I constructed my SP:
function upsertEcertAssignments(ecerts) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
// Validate input
if (!ecerts) throw new Error("The ecerts is null or undefined");
if (ecerts.length == 0) throw new Error("The ecerts list size is 0");
// Recursively call the 'process' function
processEcerts(ecerts, 0);
function processEcerts(ecerts, index) {
if (index >= ecerts.length) {
response.setBody(index);
return;
}
var query = {query: "SELECT * FROM DigitalEcerts c WHERE c.code = #code AND c.collectionType = #type", parameters: [{name: "#code", value: ecerts[index].code}, {name: "#type", value: 0}]};
var isQueryAccepted = collection.queryDocuments(collectionLink, query, {partitionKey: ecerts[index].code}, function(err, foundDocuments, foundOptions) {
if (err) throw err;
if (foundDocuments.length > 0) {
var existingEcert = foundDocuments[0];
ecerts[index].id = existingEcert.id;
var isAccepted = __.replaceDocument(existingEcert._self, ecerts[index], function(err, updatedEcert, replacedOptions) {
if (err) throw err;
processEcerts(ecerts, index + 1);
});
if (!isAccepted) {
response.setBody(index);
}
} else {
var isAccepted = __.createDocument(__.getSelfLink(), ecerts[index], function(err, insertedEcert, insertedOptions) {
if (err) throw err;
processEcerts(ecerts, index + 1);
});
if (!isAccepted) {
response.setBody(index);
}
}
});
if (!isQueryAccepted)
response.setBody(index);
}
}
From .NET, if I call it like this, I get the partitionKey value problem:
var continuationIndex = await _docDbClient.ExecuteStoredProcedureAsync<int>(UriFactory.CreateStoredProcedureUri(_docDbDatabaseName, _docDbDigitalEcertsCollectionName, "UpsertDigitalMembershipEcertAssignments"), digitalEcerts);
If I call it with a partition key, it works...but it is useless:
var continuationIndex = await _docDbClient.ExecuteStoredProcedureAsync<int>(UriFactory.CreateStoredProcedureUri(_docDbDatabaseName, _docDbDigitalEcertsCollectionName, "UpsertDigitalMembershipEcertAssignments"), new RequestOptions { PartitionKey = new PartitionKey(digitalEcerts[0].Code) }, digitalEcerts.Take(1).ToList());
I appreciate any pointer.
Thanks.

By the sound of it, your unique id is a combination of code and type. I would recommend making your id property to be the combination of two.
This guarantees that your id is unique but also eliminates the need to query for it.

If the collection the stored procedure is registered against is a
single-partition collection, then the transaction is scoped to all the
documents within the collection. If the collection is partitioned,
then stored procedures are executed in the transaction scope of a
single partition key. Each stored procedure execution must then
include a partition key value corresponding to the scope the
transaction must run under.
You could refer to the description above which mentioned here. We can query documents cross partitions via setting EnableCrossPartitionQuery to true in FeedOptions parameter. However, the RequestOptions doesn't have such properties against executing stored procedure.
So, It seems you have to provide partition key when you execute sp. Of course, it can be replaced by upsert function. It is useless from the perspective of the business logic, but if bulk operations, the SP can release some of the performance pressure because the SP is running on the server side.
Hope it helps you.

Cosmos DB - Deleting a document

How can I delete an individual record from Cosmos DB?
I can select using SQL syntax:
SELECT *
FROM collection1
WHERE (collection1._ts > 0)
And sure enough all documents (analogous to rows?) are returned
However this doesn't work when I attempt to delete
DELETE
FROM collection1
WHERE (collection1._ts > 0)
How do I achieve that?

The DocumentDB API's SQL is specifically for querying. That is, it only provides SELECT, not UPDATE or DELETE.
Those operations are fully supported, but require REST (or SDK) calls. For example, with .net, you'd call DeleteDocumentAsync() or ReplaceDocumentAsync(), and in node.js, this would be a call to deleteDocument() or replaceDocument().
In your particular scenario, you could run your SELECT to identify documents for deletion, then make "delete" calls, one per document (or, for efficiency and transactionality, pass an array of documents to delete, into a stored procedure).

The easiest way is probably by using Azure Storage Explorer. After connecting you can drill down to a container of choice, select a document and then delete it. You can find additional tools for Cosmos DB on https://gotcosmos.com/tools.

Another option to consider is the time to live (TTL). You can turn this on for a collection and then set an expiration for the documents. The documents will be cleaned up automatically for you as they expire.

Create a stored procedure with the following code:
/**
* A Cosmos DB stored procedure that bulk deletes documents for a given query.
* Note: You may need to execute this stored procedure multiple times (depending whether the stored procedure is able to delete every document within the execution timeout limit).
*
* #function
* #param {string} query - A query that provides the documents to be deleted (e.g. "SELECT c._self FROM c WHERE c.founded_year = 2008"). Note: For best performance, reduce the # of properties returned per document in the query to only what's required (e.g. prefer SELECT c._self over SELECT * )
* #returns {Object.<number, boolean>} Returns an object with the two properties:
* deleted - contains a count of documents deleted
* continuation - a boolean whether you should execute the stored procedure again (true if there are more documents to delete; false otherwise).
*/
function bulkDeleteStoredProcedure(query) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
deleted: 0,
continuation: true
};
// Validate input.
if (!query) throw new Error("The query is undefined or null.");
tryQueryAndDelete();
// Recursively runs the query w/ support for continuation tokens.
// Calls tryDelete(documents) as soon as the query returns documents.
function tryQueryAndDelete(continuation) {
var requestOptions = {continuation: continuation};
var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
if (err) throw err;
if (retrievedDocs.length > 0) {
// Begin deleting documents as soon as documents are returned form the query results.
// tryDelete() resumes querying after deleting; no need to page through continuation tokens.
// - this is to prioritize writes over reads given timeout constraints.
tryDelete(retrievedDocs);
} else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
tryQueryAndDelete(responseOptions.continuation);
} else {
// Else if there are no more documents and no continuation token - we are finished deleting documents.
responseBody.continuation = false;
response.setBody(responseBody);
}
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
}
// Recursively deletes documents passed in as an array argument.
// Attempts to query for more on empty array.
function tryDelete(documents) {
if (documents.length > 0) {
// Delete the first document in the array.
var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
if (err) throw err;
responseBody.deleted++;
documents.shift();
// Delete the next document in the array.
tryDelete(documents);
});
// If we hit execution bounds - return continuation: true.
if (!isAccepted) {
response.setBody(responseBody);
}
} else {
// If the document array is empty, query for more documents.
tryQueryAndDelete();
}
}
}
And execute it using your partition key (example: null) and a query to select the documents (example: SELECT c._self FROM c to delete all).
Based on Delete Documents from CosmosDB based on condition through Query Explorer

Here is an example of how to use bulkDeleteStoredProcedure using .net Cosmos SDK V3.
ContinuationFlag has to be used because of the execution bounds.
private async Task<int> ExecuteSpBulkDelete(string query, string partitionKey)
{
var continuationFlag = true;
var totalDeleted = 0;
while (continuationFlag)
{
StoredProcedureExecuteResponse<BulkDeleteResponse> result = await _container.Scripts.ExecuteStoredProcedureAsync<BulkDeleteResponse>(
"spBulkDelete", // your sproc name
new PartitionKey(partitionKey), // pk value
new[] { sql });
var response = result.Resource;
continuationFlag = response.Continuation;
var deleted = response.Deleted;
totalDeleted += deleted;
Console.WriteLine($"Deleted {deleted} documents ({totalDeleted} total, more: {continuationFlag}, used {result.RequestCharge}RUs)");
}
return totalDeleted;
}
and response model:
public class BulkDeleteResponse
{
[JsonProperty("deleted")]
public int Deleted { get; set; }
[JsonProperty("continuation")]
public bool Continuation { get; set; }
}

NodeJS: Insert record in dynamodb if not exist

I need to store user's info in DynamoDB and send a mail to the same user if it doesn't already exist in DynamoDB table. I am doing this in for loop. The list contains only 2 records. The issue is only the second record gets inserted in table and the mail is sent twice to the same user. Here is the code:
module.exports.AddUser = function(req, res, usersList, departmentId) {
var _emailId = "";
var _userName = "";
var _departmentId = departmentId;
for (var i = 0; i < usersList.length; i++) {
_emailId = usersList[i].emailId;
_userName = usersList[i].userName;
var params = {
TableName: "UsersTable",
Key: {
"emailId": _emailId,
"departmentId": _departmentId
}
};
docClient.get(params, function(err, data) {
if (!err) {
if (!data.items)
AddUserAndSendEmail("UsersTable", _emailId, _userName);
//The above function is being called twice but for the same user.
//It has a check so not inserting the same record twice but
//sending two mails to the same user.
}
});
}
res.end("success");
}
function AddUserAndSendEmail(tableName, emailId, _userName) {
var params = {
TableName: tableName,
Item: {
"emailId": emailId,
"departmentId": 101//Default Department
}
};
docClient.put(params, function(err, data) {
if (!err) {
//Send Email Code Here
} else {
console.log("error");
}
});
}
What could be the reason for this strange behavior? Really frustrated, I am about to give up on this.

1) Please note that DynamoDB is eventually consistent. If you insert the item and check whether the item exists immediately, it may not always find the item in the database.
This means the second iteration of the loop may not always find the first item inserted into the table.
2) If the item already exists in the table, the Put api will update the item and give successful response.
This means the Put will be successful for the same email id and department id in the second iteration because it updates the record if it is already present.
GetItem – The GetItem operation returns a set of Attributes for an
item that matches the primary key. The GetItem operation provides an
eventually consistent read by default. If eventually consistent reads
are not acceptable for your application, use ConsistentRead.
PutItem – Creates a new item, or replaces an old item with a new item
(including all the attributes). If an item already exists in the
specified table with the same primary key, the new item completely
replaces the existing item. You can also use conditional operators to
replace an item only if its attribute values match certain conditions,
or to insert a new item only if that item doesn’t already exist.
Based on the above points, there is a possibility to get two emails if you have same email id and department id in the array.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure CosmosDB: stored procedure delete documents based on query - azure

Related

Delete documents from Azure cosmos DB collection with multiple Partition Keys

SELECT VALUE COUNT(1) FROM (SELECT DISTINCT c.UserId FROM root c) AS t not working

Stored Procedure to update or insert docs that belong to multiple partition keys

Cosmos DB - Deleting a document

NodeJS: Insert record in dynamodb if not exist

Categories

Resources