MongoDB: findOneAndUpdate seems to be not atomic - node.js

I am using nodejs + mongodb as a backend for a largely distributed web application. I have a series of events, that need to be in a specific order. There are multiple services generating these events and my application should process and store them as they come in and at any given time I want to have them in the correct order.
I cannot rely on timestamps since javascript only provides timestamps in milliseconds, which is not accurate enough for my case.
I have two collections in my database. One that stores the events and one that stores an index, which represents my eventorder. I have tried using findOneAndUpdate in order to increase my index atomically. This however does not seem to be working.
console.log('Adding');
console.log(event.type);
this._db.collection('evtidx').findOneAndUpdate({ id : 'index' }, { $inc: { value : 1 } }, (err, res) => {
console.log('For '+event.type);
console.log('Got value: '+res.value.value);
event.index = res.value.value;
this._db.collection('events').insertOne(event, (err, evtres) => {
if (err) {
throw err;
}
});
});
When I check the output of the code above I see:
Adding
Event1
Adding
Event2
Adding
Event3
Adding
Event4
For Event1
Got value: 1
For Event3
Got value: 4
For Event2
Got value: 2
For Event4
Got value: 3
Which concludes to me, that my code is not working atomically.
The events come in in the correct index, but don't have the correct order attached to them after findOneAndUpdate. Could anyone help me out there?

Atomic database operations does not mean that they lock the database while the request is running. Maybe You are getting requests in order but they are not executed in sequential order nor in the backend nor in the database.
What you need to do is read the last document index from the 'events' collection. If its one less then your current request index then insert else wait and retry.
Although this can cause problems if one event fails because of network error or something else. Then Your request processing would stop.

Related

Dealing with race conditions and starvation when generating unique IDs using MongoDB + NodeJS

I am using MongoDB to generate unique IDs of this format:
{ID TYPE}{ZONE}{ALPHABET}{YY}{XXXX}
Here ID TYPE will be an alphabet from {U, E, V} depending on the input, zone will be from the set {N, S, E, W}, YY will be the last 2 digits of the current year and XXXXX will be a 5 digit number beginning from 0 (willbe padded with 0s to make it 5 digits long). When XXXXX reaches 99999, the ALPHABET part will be incremented to the next alphabet (starting from A).
I will receive ID TYPE and ZONE as input and will have to give the generated unique ID as output. Everytime, I have to generate a new ID, I will read the last generated for the given ID TYPE and ZONE, increment the number part by 1 (XXXXX + 1) and then save the new generated ID in MongoDB and return the output to the user.
This code will be run on a single NodeJS server and there can be multiple clients calling this method
Is there a possibility of a race condition like the once described below if I am ony running a single server instance:
First client reads last generated ID as USA2100000
Second client reads last generated ID as USA2100000
First client generates the new ID and saves it as USA2100001
Second client generates the new ID and saves it as USA2100001
Since 2 clients have generated IDs, finally the DB should have had USA2100002.
To overcome this, I am using MongoDB transactions. My code in Typescript using Mongoose as ODM is something like this:
session = await startSession();
session.startTransaction();
lastId = await GeneratedId.findOne({ key: idKeyStr }, "value").value
lastId = createNextId(lastId);
const newIdObj: any = {
key: `Type:${idPrefix}_Zone:${zone_letter}`,
value: lastId,
};
await GeneratedId.findOneAndUpdate({ key: idKeyStr }, newIdObj, {
upsert: true,
new: true,
});
await session.commitTransaction();
session.endSession();
I want to know what exactly will happen when the situation I
described above happens with this code?
Will the second client's transaction throw an exception and I have to abort or retry the transaction in my code or will it handle the retry on its own?
How does MongoDB or other DBs handle transactions? Does MongoDB lock the documents involved in the transaction? Are the exclusive locks (wont even allow other clients to read)?
If the same client keeps failing to commit its transaction, this client would be starved. How to deal with this starvation?
You are using MongoDB to store the ID. It's a state. Generation of the ID is a function. You use Mongodb to generate the ID when mongodb process takes arguments of the function and returns the generated ID. It's not what you are doing. You are using nodejs to generate the ID.
Number of threads, or rather event loops is critical as it defines the architecture but in either way you don't need transactions. Transactions in mongodb are being called "multi-document transactions" exactly to highlight they are intended for consistent update of several documents at once. The very first paragraph of https://docs.mongodb.com/manual/core/transactions/ warns you that if you update a single document there is no room for transactions.
A single threaded application does not require any synchronisation. You can reliably read the latest generated ID on start and guarantee the ID is unique within the nodejs process. If you exclude mongodb and other I/O from the generation function you will make it synchronous so you can maintain state of the ID within nodejs process and guarantee its uniqueness. Once generated you can persist in in the db asynchronously. In the worst case scenario you may have a gap in the sequential numbers but no duplicates.
If there is a slighteest chance that you may need to scale up to more than 1 nodejs process to handle more simultaneous requests or add another host for redundancy in the future you will need to sync generation of the ID and you can employ Mongodb unique indexes for that. The function itself doesn't change much you still generate the ID as in a single-threaded architecture but add an extra step to save the ID to mongo. The document should have unique index on the ID field, so in case of concurrent updates one of the query will successfully add the document and another will fail with "E11000 duplicate key error". You catch such errors on nodejs side and repeat the function again picking the next number:
This is what you can try. You need to store only one document in the GeneratedId collection. This document will have the last generated id's value. The document must have a known _id field, for example lets say it will be an integer with value 1. So, the document can be like this:
{ _id: 1, lastGeneratedId: "<some value>" }
In your application, you can use the findOneAndUpdate() method with a filter { _id: 1 }; which means you are targeting one document update. This update will be an atomic operation; as per the MongoDB documentation "All write operations in MongoDB are atomic on the level of a single document." . Do you need a transaction in this case? No. The update operation is atomic and performs better than using a transaction. See Update Documents - Atomicity.
Then, how do I generate the new generated id and retrieve it?
I will receive ID TYPE and ZONE...
Using the above input values and the existing lastGeneratedId value you can arrive at the new value and update the document (with the new value). The new value can be calculated / formatted within the Aggregation Pipeline of the update operation - you can use the feature Updates with Aggregation Pipeline (this is available with MongoDB v4.2 or higher).
Note the findOneAndUpdate() method returns the updated (or modified) document when you use the update option new: true. This returned document will have the newly generated lastGeneratedId value.
The update method can look like this (using NodeJS driver or even Mongoose):
const filter = { _id: 1 }
const update = [
{ $set: { lastGeneratedId: { // your calculation of new value goes here... } } }
]
const options = { new: true, projection: { _id: 0, lastGeneratedId: 1} }
const newId = await GeneratedId.findOneAndUpdate(filter, update, options).['lastGeneratedId']
Note about the JavaScript function:
With MongoDB v4.4 you can use JavaScript functions within an Aggregation Pipeline; and this is applicable for the Updates with Aggregation Pipeline. For details see $function aggregation pipeline operator.

improve mongo query performance

I'm using a node based CMS system called Keystone, which uses MongoDB for a data store, giving fairly liberal control over data and access. I have a very complex model called Family, which has about 250 fields, a bunch of relationships, and a dozen or so methods. I have a form on my site which allows the user to enter in the required information to create a new Family record, however the processing time is running long (12s on localhost and over 30s on my Heroku instance). The issue I'm running into is that Heroku emits an application error for any processes that run over 30s, which means I need to optimize my query. All processing happens very quickly except one function. Below is the offending function:
const Family = keystone.list( 'Family' );
exports.getNextRegistrationNumber = ( req, res, done ) => {
console.time('get registration number');
const locals = res.locals;
Family.model.find()
.select( 'registrationNumber' )
.exec()
.then( families => {
// get an array of registration numbers
const registrationNumbers = families.map( family => family.get( 'registrationNumber' ) );
// get the largest registration number
locals.newRegistrationNumber = Math.max( ...registrationNumbers ) + 1;
console.timeEnd('get registration number');
done();
}, err => {
console.timeEnd('get registration number');
console.log( 'error setting registration number' );
console.log( err );
done();
});
};
the processing in my .then() happens in milliseconds, however, the Family.model.find() takes way too long to execute. Any advice on how to speed things up would be greatly appreciated. There are about 40,000 Family records the query is trying to dig through, and there is already an index on the registrationNumber field.
It makes sense that the then() executes quickly but the find() takes a while; finding the largest value in a set of records is a relatively quick database operation while getting the set could potentially be very time-consuming depending on a number of factors.
If you are simply reading the data and presenting it to the user via REST or some sort of visual interface, you can make use of lean() which will return plain javascript objects. By default, you are returning a mongoose.Document which in your case is unnecessary as there does not appear to be any data manipulation after your read query; you are just getting the data.
More importantly, it appears that all you need is one record: the record with the largest registrationNumber. You should always use findOne() when you are looking for one record in any set of records to maximize performance.
See previous answer detailing using findOne in a node.js implementation, or see mongoDB documentation for general information about this collection method.

Mongoose - Optimal way to implement friendships: 2 pointers, pushing once to both arrays?

Question: When creating something like a simple many to many friendship in mongoose, I know how to create it on ONE object, for instance, the code below in the controller shows that I am finding one user, and pushing to his friends array another user, being referenced via ObjectId.
In this way, when I look at the Json file, I can see user with _id of "57ed2e8c9cf3083c2ccec173", has a new friend in his friend's array, and I can run a population to get that friend user document. However, user who was added as a friend does not have these capabilities because his array of friends is still empty.
I know there are multiple ways to go about this, as I have read the docs, which say I could simply now push user 1 into user 2's friends array, but, in the words of the docs: "It is debatable that we really want two sets of pointers as they may get out of sync. Instead we could skip populating and directly find() the stories we are interested in."
In other words, if you have an event model with many users, and user model with many events, and you need to access the array of users from the event document, and the array of events from the user document... Would it be best to just push each instance into each other?
Is this the correct way of thinking?
Thanks
```
app.post('/friendships', function(req, res) {
User.findOne({
_id: "57ed2e8c9cf3083c2ccec173"
}, function(err, user1) {
User.findOneAndUpdate({
_id: "57ed2ebbedcd96a4536467f7"
}, {$push: {friends: user1 }}, {upsert: true}, function(err, user2) {
console.log("success");
})
})
});
```
Yes, this is the correct way of thinking, considering the limitations of Mongo for that sort of data.
When you store such an information in two places, you need to make sure that it is consistent - i.e. either it is present in both places or not. You don't have transactions in Mongo so the only way you can do it is to chain the requests and manually roll back the first one if the second one failed, hoping that it's possible to do (which may not be the case - if the second update failed because you lost a connection to the database, there is a good chance that your rollback will fail as well, in which case your database is left in an inconsistent state).
An alternative would be to store only one half of the relationship - e.g. only store events in users, but no users in events, using your example. That way the data would be consistently stored in one place but then if you wanted to get a list of users for a certain event, you'd have to make a possibly expensive database lookup instead of having it already present in the event document.
In practice in most cases I have seen storing data in two places and trying to keep them consistent.
Though it is usually done with storing documents IDs, so instead of:
{$push: {friends: user1}}
it's usually:
{$push: {friends: user1._id}}
(or just using the _id if you have it in the first place)
And instead of $push you can use $addToSet - see: https://docs.mongodb.com/manual/reference/operator/update/addToSet/
Here is a basic concept of adding a two-directional friendship between id1 and id2:
function addFriendship(id1, id2) {
User.findOneAndUpdate({_id: id1}, {$addToSet: {friends: id2}}, err => {
if (err) {
// failure - no friendship added
} else {
// first friendship added, trying the second:
User.findOneAndUpdate({_id: id2}, {$addToSet: {friends: id1}}, err => {
if (err) {
// second friendship not added - rollback the first:
User.findOneAndUpdate({_id: id1}, {$pull: {friends: id2}}, err => {
if (err) {
// we're screwed
} else {
// rolled back - consistent state, no friendship
}
});
} else {
// success - both friendships added
}
});
}
});
}
Not pretty and not bulletproof but that's the most you can hope for with a database with no transactions where denormalized data is the norm.
(Of course friendship don't always work that way that they have to be bidirectional, but this is just an example of a pattern that is common for any many-to-many relationaship.)

MongoDB - two updates in sequence overlap each other

We are building size calculation mechanism for our system.
In order to calculate sizes, we start with the first atomic operation - findAndModify - to find the object and add lock properties to it (to prevent another calculations for this object to interact with it and wait till the end, as we could have many parallel calculations - in this case others should be postponed), then we calculate size of specific properties and after this operation - we add metadata to object and delete locks.
However, it seems that sometimes, when we have a lot of multiple calculations for single object (especially when we calculate a lot of objects in parallel), some updates aren't executed.
_size metadata during calculation looks like this:
{
_lockedAt: SomeDate,
_transactionId: 'abc'
}
And after calculation it should look like this:
{
somePropertySize: 123,
anotherPropertySize: 1245,
(...)
_total: 131431523 // Some number
// Notice that both _lockedAt and _transactionId should be missing
}
And this is how our update flow looks like:
return Promise.coroutine(function * () {
yield object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => results.value);
// Calculations are performed here, new _size object is built
yield object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
})()
Exemplary real-life object JUST before second update (truncated for brevity):
{
title: 11,
description: 2,
detailedSection: 0,
tags: 2
file: 5625898,
_total: 5625913
}
For some reason, when we have multiple calculations next to each other, sometimes (for new objects, without _size property at all), the objects stay with _size object looking exactly as after locking, despite the fact logs show us that everything went well (calculations were complete, new sizes object was calculated and second DB update was called).
We use MongoDB 3.0, two replicaSets. Any ideas on what is happening?
Put the second update after the then so it will wait until the promise resolves:
object.findOneAndUpdate({
'_id': gemId,
'_size._lockedAt': {
$exists: false
}
}, {
$set: {
'_size._lockedAt': moment.utc().toDate(),
'_size._transactionId': transactionId
}
}).then(results => {
// Calculations are performed here, new _size object is built
object.findOneAndUpdate({
_id: gemId,
_lockedAt: {
$exists: true // We tried both with and without this property, does not change anything
}
}, {
$set: {
_size: newSizeObject
}
});
}).catch(err => console.error);
Also make sure you have error handling for your promises using catch.
If you don't really need the lock or transaction fields then I would remove that stuff. If you do need them, something like RethinkDB may work a little better, or PostgresSQL could give real transactions.
All in all, I checked the code very carefully and what was happening in reality, was the fact that completely different part of the code was querying the object from the DB and then, after a few other operations (mine included), it wrote the object to the DB (hence, overwriting my changes).
So, important note for every MongoDB user - please do remember that MongoDB is not transactional, but still atomic, which means that it guarantees that your operation will be persisted, but does not guarantee that data between operations will be persisted.
To sum up, things I learned by this example:
NEVER update whole object in the database with the data obtained from it some time before (e.g. by querying, changing some properties and saving again)
USE $set, $inc, $unset and other special operators. If you have a lot of parameters, use e.g. mongo-dot-notation npm library to flatten your data into $set selector.
If something unexpected is happening with your data (e.g. missing properties after saving) the first thing to investigate is another pending operations on those specific entities
The least probable cause of your problems is MongoDB itself. It's usually code that does not follow atomicity rules (which happens probably with a lot of people used to transactional DBs :)).

Mongodb, incrementing value inside an array. (save() ? update() ?)

var Poll = mongoose.model('Poll', {
title: String,
votes: {
type: Array,
'default' : []
}
});
I have the above schema for my simple poll, and I am uncertain of the best method to change the value of the elements in my votes array.
app.put('/api/polls/:poll_id', function(req, res){
Poll.findById(req.params.poll_id, function(err, poll){
// I see the official website of mongodb use something like
// db.collection.update()
// but that doesn't apply here right? I have direct access to the "poll" object here.
Can I do something like
poll.votes[1] = poll.votes[1] + 1;
poll.save() ?
Helps much appreciated.
});
});
You can to the code as you have above, but of course this involves "retrieving" the document from the server, then making the modification and saving it back.
If you have a lot of concurrent operations doing this, then your results are not going to be consistent, as there is a high potential for "overwriting" the work of another operation that is trying to modify the same content. So your increments can go out of "sync" here.
A better approach is to use the standard .update() type of operations. These will make a single request to the server and modify the document. Even returning the modified document as would be the case with .findByIdAndUpdate():
Poll.findByIdAndUpdate(req.params.poll_id,
{ "$inc": { "votes.1": 1 } },
function(err,doc) {
}
);
So the $inc update operator does the work of modifying the array at the specified position using "dot notation". The operation is atomic, so no other operation can modify at the same time and if there was something issued just before then the result would be correctly incremented by that operation and then also by this one, returning the correct data in the result document.

Resources