Nodejs, MongoDB concurrent request creates duplicate record - node.js

Let me be real simple. I am running node.js server. When I receive data from patch request, first I need to check in database. If the row exists I will update it, otherwise I will create that record. Here is my code. This is what I am calling in the request handler callback.
let dbCollection = db$master.collection('users');
createUser(req.body).then(user => {
dbCollection.updateMany(
{ rmisId: req.body.rmisId },
{ $set: { ...user } },
{ upsert: true }
).then((result) => {
log.debug("RESULTS", user);
return result;
})
.catch((err => {
log.debug(err);
return err;
}));
})
This is working fine in sequential requests. But its creating duplicate record when I receive 10 concurrent request. I am running on my local machine and replicating concurrent request using Apache JMeter. Please help me if you have experienced this kind of problem.
Thank you !
UPDATE
I have tested another approach that reads the database like dbCollection.find({rmisId: req.body.rmisId}) the database for determine its existing or no. But it has no difference at all.

You cannot check-and-update. Mongodb operations are atomic at the document level. After you check and see that the record does not exist, another request may create the document you just checked, and after that you can recreate the same record if you don't have unique indexes or if you're generating the IDs.
Instead, you can use upsert, as you're already doing, but without the create. It looks like you're getting the ID from the request, so simply search using that ID, and upsert the user record. That way if some other thread inserts it before you do, you'll update what the previous thread inserted. If this is not something you prefer, add a unique index for that user ID field.

Related

How to efficiently sync Apollo's cache using subscriptions and AWS AppSync

I'm using aws-appsync in a Node.js client to keep a cached list of data items. This cache must be available at all times, including when not connected to the internet.
When my Node app starts, it calls a query which returns the entire list of items from the AppSync data source. This is cached by Apollo's cache storage, which allows future queries (using the same GraphQL query) to be made using only the cache.
The app also makes a subscription to the mutations which are able to modify the list on other clients. When an item in the list is changed, the new data is sent to the app. This can trigger the original query for the entire list to be re-fetched, thus keeping the cache up to date.
Fetching the entire list when only one item has changed is not efficient. How can I keep the cache up to date, while minimising the amount of data that has to be fetched on each change?
The solution must provide a single point to access cached data. This can either be a GraphQL query or access to the cache store directly. However, using results from multiple queries is not an option.
The Apollo documentation hints that this should be possible:
In some cases, just using [automatic store updates] is not enough for your application ... to update correctly. For example, if you want to add something to a list of objects without refetching the entire list ... Apollo Client cannot update existing queries for you.
The alternatives it suggests are refetching (essentially what I described above) and using an update callback to manually update the cached query results in the store.
Using update gives you full control over the cache, allowing you to make changes to your data model in response to a mutation in any way you like. update is the recommended way of updating the cache after a query.
However, here it is referring to mutations made by the same client, rather than syncing using between clients using subscriptions. The update callback option doesn't appear to be available to a subscription (which provides the updated item data) or a query (which could fetch the updated item data).
As long as your subscription includes the full resource that was added, it should be possible by reading from and writing to the cache directly. Let's assume we have a subscription like this one from the docs:
const COMMENTS_SUBSCRIPTION = gql`
subscription onCommentAdded {
commentAdded {
id
content
}
}
`;
The Subscription component includes a onSubscriptionData prop, so we should be able to do something along these lines:
<Subscription
subscription={COMMENTS_SUBSCRIPTION}
onSubscriptionData={({ client, subscriptionData: { data, error } }) => {
if (!data) return
const current = client.readQuery({ query: COMMENTS_QUERY })
client.writeQuery({
query: COMMENTS_QUERY,
data: {
comments: [...current.comments, data.commentAdded],
},
})
}}
/>
Or, if you're using plain JavaScript instead of React:
const observable = client.subscribe({ query: COMMENTS_SUBSCRIPTION })
observable.subscribe({
next: (data) => {
if (!data) return
const current = client.readQuery({ query: COMMENTS_QUERY })
client.writeQuery({
query: COMMENTS_QUERY,
data: {
comments: [...current.comments, data.commentAdded],
},
})
},
complete: console.log,
error: console.error
})

How to create multiple documents in Mongoose in one request

How would I go about creating multiple documents with different schemas in one REST API request in Node/Mongoose/Express?
Say for example I need to create a user and a site on a single request, say for example /createUser.
I could of course create a user and then in the returned promise, create the next record, but what if that second record doesn't meet validation? Then I've created a user without the second record.
User.create(userData)
.then(user => {
Site.create(siteData)
.then(site => {
// Do something
})
.catch(err => {
console.log(err)
// If this fails, I'm left with a user created without
// a site.
})
})
.catch(err => {
console.log(err)
})
Is there a good practice to follow when creating multiple documents like this? Should I run manual validation instead before each .create() runs? Any guidance/advice would be very much appreciated!
You have the transaction problem here. You are trying to write into two different models but want the whole operation to be atomic and if any of them fails you need to rollback. Up until mongo 4.0 transactions were not supported by mongo and a work around for these sort issues was two phased commits. Now in mongo 4.0 we have transactions to cater such problems.
Hope it helped.

Concurrentcy update in mongodb

I have used this code on node.js server side to update multiple embedded documents.
DetailerItemGroupModel.find({
"_id": itemGroupId
})
.forEach(function (doc) {
doc.Products.forEach(function (ch) {
//do something before update
});
DetailerItemGroupModel.save(doc);
});
I have the scenario like this :
Client A and B do GET http Method to get the same documents in the same time, then client B doing the update first(call to server which will run the code above) and then client A do the update.
So I want when client A's doing the UPDATE HTTP method to the server they must get the data which was latest (in this case is the document was updated by B) , I mean some how to cancel Client A request and return a bad request to tell the data Client A's going to update was changed by another client. Any way to implement that?
I read about "__v", but not sure when Client A and B send request to update same document at the same time, Does it work with forEach(), I change the code to this
DetailerItemGroupModel.find({
"_id": itemGroupId,
"__v" : {document version}
})
.forEach(function (doc) {
doc.Products.forEach(function (ch) {
//do some thing before update
});
DetailerItemGroupModel.save(doc);
});
The idea is that you want your "Save" to only work on the version that you have (which should be the latest).
Using .save will not help you here.
You can use one of two functions:
update.
findOneAndUpdate
The Idea is you have to do the find and save in one Atomic operation. And in the conditions object, you not only send the _id but you also send the __v which is the version identifier. (Something similar to the below example)
this.update({_id: document._id, __v: document.__v}, {$set: document, $inc: {__v: 1}}, result.cb(cb, function(updateResult) {
...
});
Say you read version 10, now you are ready to update it (save it). It will only work if it finds the document with the particular id and version. If in the meantime the database gets a newer version (version 11) then your update will fail.
BTW, MongoDb now has Transactions, you might want to look into that, because a transaction B would make your code wait until transaction A finishes it's atomic operation, which would be a better model than doing an update and failing and then trying again...

How to Determine if a Document was Actually Changed During Update in MongoDB

I am using the Mongoose driver with NodeJS. I have quite a simple update call whose purpose is to sync an external source of meetings to my database:
collection.update({ meeting_id: doc.meeting_id}, newDoc, {upsert:true})
The object returned determines whether or not an update or an insert occurred. This works perfectly. My issue is that I must determine if an actual change occurred. When you update a document with itself, MongoDB treats this in exactly the same way as if all fields were changed.
So my question is: Is there any good way to tell if anything actually changed? I could search for each document then compare each field manually, but this seems like a poor (and slow) solution.
you can use findAndModify which will return updated results as compared to update which will return no of updated records.
collection.findAndModify(
{ meeting_id: doc.meeting_id},
newDoc,
{ new: true },
function (err, documents) {
res.send({ error: err, affected: documents });
}
);

Preventing concurrent access to documents in Mongoose

My server application (using node.js, mongodb, mongoose) has a collection of documents for which it is important that two client applications cannot modify them at the same time without seeing each other's modification.
To prevent this I added a simple document versioning system: a pre-hook on the schema which checks if the version of the document is valid (i.e., not higher than the one the client last read). At first sight it works fine:
// Validate version number
UserSchema.pre("save", function(next) {
var user = this
user.constructor.findById(user._id, function(err, userCurrent) { // userCurrent is the user that is currently in the db
if (err) return next(err)
if (userCurrent == null) return next()
if(userCurrent.docVersion > user.docVersion) {
return next(new Error("document was modified by someone else"))
} else {
user.docVersion = user.docVersion + 1
return next()
}
})
})
The problem is the following:
When one User document is saved at the same time by two client applications, is it possible that these interleave between the pre-hook and the actual save operations? What I mean is the following, imagine time going from left to right and v being the version number (which is persisted by save):
App1: findById(pre)[v:1] save[v->2]
App2: findById(pre)[v:1] save[v->2]
Resulting in App1 saving something that has been modified meanwhile (by App2), and it has no way to notice that it was modified. App2's update is completely lost.
My question might boil down to: Do the Mongoose pre-hook and the save method happen in one atomic step?
If not, could you give me a suggestion on how to fix this problem so that no update ever gets lost?
Thank you!
MongoDB has findAndModify which, for a single matching document, is an atomic operation.
Mongoose has various methods that use this method, and I think that they will suit your use case:
Model.findOneAndUpdate()
Model.findByIdAndUpdate()
Model.findOneAndRemove()
Model.findByIdAndRemove()
Another solution (one that Mongoose itself uses as well for its own document versioning) is to use the Update Document if Current pattern.

Resources