CosmosDB SQL API: Delete element matching criteria via partial update - azure

I'm trying to remove a particular element matching criteria, in a nested array, when I already know the document id. For example, imagine a document like this (taken from https://devblogs.microsoft.com/cosmosdb/understanding-how-to-query-arrays-in-azure-cosmos-db/)
{
"id": "Tim",
"city": "Seattle",
"gifts": [
{
"recipient": "Andrew",
"gift": "blanket"
},
{
"recipient": "Deborah",
"gift": "board game"
},
{
"recipient": "Chris",
"gift": "coffee maker"
}
]
}
I want to remove any element from the "gifts" array that has a recipient of "Andrew". A naieve approach is to pull the document, remove the array element, and set the new document. But that concerns me because it exposes a race condition where, during the time of pulling and updating the document, a different mutation could be made to that array, which would be lost.
I'd like to see if I can perform a partial document update to remove the array element. However, the partial document update documentation at https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update states (for "remove"):
If the target path is an array index, it will be deleted and any elements above the specified index are shifted one position to the left.
So it seems like the only means of removing an array element via a partial document update is by its array index. But if I pull its array index and then perform the update, I have a similar race condition: what if the array is mutated between when I pull the document and when I perform the partial doc update?
I see that for partial document updates, you can do a "Conditional Update":
Conditional Update: For the aforementioned modes, it is also possible
to add a SQL-like filter predicate (for example, from c where
c.taskNum = 3) such that the operation fails if the pre-condition
specified in the predicate is not satisfied.
I'm wondering if this is something I could use to achieve my goal.
Essentially, I think I'm looking for something like https://www.mongodb.com/docs/manual/reference/operator/update/pull/ but within the SQL API.

the array is mutated between when I pull the document and when I perform the partial doc update
The only way to avoid concurrent updates to the same document is by doing Optimistic Concurrency. You would need to first Read the current document, obtain it's ETag, and then use that on the update operation by passing it to the CosmosItemRequestOptions:
requestOptions.setIfMatchETag("etag from the read");
Full example: https://github.com/Azure-Samples/azure-cosmos-java-sql-api-samples/blob/0ead4ca33dac72c223285e1db866c9dc06f5fb47/src/main/java/com/azure/cosmos/examples/documentcrud/async/DocumentCRUDQuickstartAsync.java#L366-L418
In that case, if there was any concurrent update that happened, you'd get a failure with status code 412, which means a concurrent update happened and you'd need to retry the operation (read, get the new etag, find the new index to remove, execute the update).

Related

Can mongoose batch update based on an array of objects that matches the collection?

I am working on a project in Express/Node, and I am utilizing a MongoDB database that has a collection of Course documents that represent a course in my school system that changes in real-time. The Course documents in my database each look like this:
Course Document
{
courseID: Number,
restrictions: String,
status: String,
}
My program has to check for changes in the school's course system, and update any changes that it sees and updates my private MongoDB database with the changes. To accomplish this, I currently have a script that looks at all the courses in the school system, and records them in an array of objects, with each object corresponding to a course.
var allCourses =
[
{
courseID: 123456,
restrictions: "A and B",
status: "OPEN"
},
{
courseID: 678990,
restrictions: "A",
status: "FULL",
}
]
The goal now is to be able to go through my database, and skip the documents that are the same as the corresponding javascript object in the array, and update those that are not.
Obviously, I could just iterate through my array with forEach, and update every single course by filtering by 'courseID' and updating both fields one document at a time, but I can foresee that this would take a large amount of time.
I was wondering if there was a batch update function, similar to the insertMany operation, that can take my array of objects and update my database documents that correspond to an object within the array?
These are helpful links
Trying to do a bulk upsert with Mongoose. What's the cleanest way to do this?
https://docs.mongodb.com/manual/reference/method/db.collection.insertMany/

how can i find as well as update many documents?

I want to update the results of find query on certain conditions.now what i was thinking it that will mongodb will search whole collection for find and update or use pointer from the previous find query.I just wanted to optimism my queries that's why i was thinking about it.so is there anyway to achieve this?
update:I also want the documents.
ex-collection.find({conditions}).foreach({some condition based on which update will be called})
now what i want is that update query which will be called from foreach function uses pointer from previous find query rather than searching through the collection again.
my point is when we first use find query we search the collection and a cursor is returned which is a pointer to collection in memory.now that we have that pointer why can't we use that to update the document rather than again searching the collection and then updating it.
If you want to keep your code you can use:
collection.find({conditions}).foreach((doc) => {
if (some_conditions) {
return collection.findOneAndUpdate({_id: doc._id}, {$set: {updated_fields});
}
})
but as mentioned in the comments i'm not sure exactly what conditions need to be met but you probably can just use the update method to save time.

CouchDB filter function and continuous feed

I have a filter function filtering based on document property, e.g. "version: A" and it works fine, until there a document update at some point in time when this property "version: A" removed (or updated to "version: B").
At this point i would like to be notified that the document been updated, similar to one when the document get deleted, but couldn't find an effective way (without listening and processing all documents changes).
Hope i'm just missing something and it's not a design limitation.
While my other answer is a valid approach, I had this same situation yesterday and decided to look at making this work using Mango selectors. I did the following:
Establish a changes feed filtered by the query selector (see the "_selector" filter for /db/_changes)
Perform the query (db/_find) and record the results
Establish a second changes feed that filters for just in the documents returned in (2) (see the "_doc_ids" filter for /db/_changes)
The feed at (1) lets you know when new documents match your query along with edits to documents that matched your query both before and after the change.
The feed at (2) lets you know when a change is made to a document that previously matched your query, irrespective of if it matches your query after the change has been made.
The combination of these feeds covers all cases, though with some false positives. On a change in either feed, tear down the changes feed at (3) and redo steps (2) and (3).
Now, some notes on this approach:
This is really only suitable in cases where the number of documents returned by the query is small because if the filtering by _id in the second feed.
Care must be taken to ensure that the second feed is established correctly if there are lots of changes coming in from the first changes feed.
There are cases where a change will appear in both feeds. It would be good to avoid reacting twice.
If changes are expected to happen frequently, then employ debouncing or rate limiting if your client does not need to process each and every change notification.
This approach worked well for me and the cases I had to deal with.
References:
http://docs.couchdb.org/en/stable/api/database/find.html
http://docs.couchdb.org/en/stable/api/database/changes.html
The behaviour that you described is correct.
CouchDB will populate the changes feed with the docs that accomplish with the filter function. If you remove/modify the information that is used by the filter function the filtered changes feed will ignore those updates.
The closest you will come to this is to use a view and filter the changes feed based on that view - see [1] for details.
You can create a simple view that includes the "version" as part of the key using a map function such as:
function (doc) {
emit(doc.version, 1);
}
A changes feed filtered by this view will notify you of the insert or deletion of documents that have a "version" field as well as changes to the "version" field of existing documents. You can not, however, determine the previous value of the "version" field from the changes feed.
Depending on your requirements, you can make the view more targeted. For example, if you only cared about transition form "A" to "B" then you could include only documents that have "A" or "B" as their "Version":
function (doc) {
if( doc.version === "A" || doc.version === "B") {
emit(doc.version, 1);
}
}
But be aware that this will not trigger a change notification on transition from, say, "A" to "C" (or any other value for "version", including when the document is deleted) because change notifications are only send when the map function emit()'s at least one value for a document. It doesn't not notify you when the map function used to emit at least one value for a give document, but no longer does!
You can also filter the changes feed using Mango selectors, so if Mango queries work for you then perhaps this is simpler than using a view, but I'm not sure that you can be notified of deletions via Mango selectors...
EDIT:
May claim about the simple map function above is not quite right as it will notify you of all document insertions and deletions, not just ones with a "version" field. You can do this to avoid some of those false positive notifications:
function (doc) {
if ( doc.hasOwnProperty( 'version' ) || doc.hasOwnProperty( '_deleted' ) ) {
emit(doc.version, 1);
}
}
That will give notifications for new documents with a "version" field, or an update that adds a "version" field to an existing document, but it will still notify of all deletions.
[1] http://docs.couchdb.org/en/stable/api/database/changes.html#changes-filter-view

How can I reduce a collection by keeping only change points?

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474116005,"_id":"kX0DpoZ5fkMr2ezg"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474684808,"_id":"ken1WRN47PTW159H"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474275606,"_id":"ko9r8u860es7E2hI"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?
MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.

How to efficiently bulk insert and update mongodb document values from an array?

I have a Tags collection which contains documents of the following structure:
{
word:"movie", //tag word
count:1 //count of times tag word has been used
}
I am given an array of new tags that need to be added/updated in the Tags collection:
["music","movie","book"]
I can update the counts all Tags currently existing in the tags collection by using the following query:
db.Tags.update({word:{$in:["music","movies","books"]}}, {$inc:{count:1}}), true, true);
While this is an effective strategy to update, I am unable to see which tag values were not found in the collection, and setting the upsert flag to true did not create new documents for the unfound tags.
This is where I am stuck, how should I handle the bulk insert of "new" values into the Tags collection?
Is there any other way I could better utilize the update so that it does upsert the new tag values?
(Note: I am using Node.js with mongoose, solutions using mongoose/node-mongo-native would be nice but not necessary)
Thanks ahead
The concept of using upsert and the $in operator simultaneously is incongruous. This simply will not work as there is no way to different between upsert if *any* in and upsert if *none* in.
In this case, MongoDB is doing the version you don't want it to do. But you can't make it change behaviour.
I would suggest simply issuing three consecutive writes by looping through the array of tags. I know that's it's annoying and it has a bad code smell, but that's just how MongoDB works.

Resources