Mongoose : Unable to upsert data in for loop - node.js

I want to upsert document in the following way
for (var i = 0; i < req.body.app_events.length; i++ ) {
console.log(req.body.app_events[i].event_key);
//delete upsertData._id;
Appusers.update({app_key: req.body.app_key, e_key:req.body.app_events[i].event_key}, {
$set : {
app_key:req.body.app_key,
e_key: req.body.app_events[i].event_key,
e_name: req.body.app_events[i].event_name
}}, { upsert: true}, function(err, data){
if(err) return console.log(err);
console.log(data);
});
}
it is creating a single document with _id only. i want to insert document if it exist otherwise update on the basis of e_key and app_key.

You really should not be calling asynchronous functions inside a synchronous loop. What you need is something that repects the callback on completion of the loop cycle and will alert when the update is complete. This makes incrementing counters externally safe.
Use something like async.whilst for this:
var i = 0;
async.whilst(
function() { return i < req.body.app_events.length; },
function(callback) {
console.log(req.body.app_events[i].event_key);
//delete upsertData._id;
Appusers.findOneAndUpdate(
{ app_key: req.body.app_key, e_key:req.body.app_events[i].event_key},
{
$set : {
app_key:req.body.app_key,
e_key: req.body.app_events[i].event_key,
e_name: req.body.app_events[i].event_name
}
},
{ upsert: true},
function(err,data) {
if (err) callback(err);
console.log(data);
i++;
callback();
}
);
},
function(err) {
if (err)
console.log(err);
else
// done;
}
);
Now the loop is wrapped with a "callback" which is called in itself within the callback to the update method. Also if you expect a "document" back then you should be using .findOneAndUpdate() as .update() just modifies the content and returns the number affected.
When the loop is complete or when an error is passed to the callback, then handling is moved to the last function block, where you complete your call or call other callbacks as required.
Better than above. Dig into the native driver methods for Bulk operations. You need to be careful that you have an open connection to the database already established. If unsure about this, then try to always wrap application logic in:
mongoose.connection('once',function(err) {
// app logic here
});
Which makes sure the connections have been made. The mongoose methods themselves "hide" this away, but the native driver methods have no knowledge.
But this is the fastest possible listing to update the data:
var i = 0;
var bulk = Appusers.collection.initializeOrderedBulkOp();
async.whilst(
function() { return i < req.body.app_events.length; },
function(callback) {
console.log(req.body.app_events[i].event_key);
bulk.find(
{ app_key: req.body.app_key, e_key:req.body.app_events[i].event_key},
).upsert().updateOne({
$set : {
app_key:req.body.app_key,
e_key: req.body.app_events[i].event_key,
e_name: req.body.app_events[i].event_name
}
});
i++;
if ( i % 1000 == 0) {
bulk.execute(function(err,response) {
if (err) callback(err);
console.log(response);
bulk = Appusers.collection.initializeOrderedBulkOp();
callback();
})
} else {
callback();
}
},
function(err) {
if (err)
console.log(err);
else {
if ( i % 1000 != 0 )
bulk.execute(function(err,response) {
if (err) console.log(err)
console.log(response);
// done
});
else
// done
}
}
);
The Bulk methods build up "batches" of results ( in this case 1000 at a time ) and send all to the server in one request with one response ( per batch ). This is a lot more efficient than contacting the database once per every write.

Related

How to query mongoDB to see if matching record exists, and if it does return it to update

Seems like a super basic task, but I just cannot get this to work (not very experienced with mongo or nodeJS).
I have an array of records. I need to check the DB to see if any records with a matching name already exist and if they do grab that record so I can update it.
Right now I am trying this
function hit_the_db(db, record_name, site_id) {
return new Promise((resolve, reject) => {
var record = db.collection('' + site_id + '_campaigns').find({name: record_name}).toArray(function(err, result) {
if (err) {
console.log('...error => ' + err.message);
reject(err);
} else {
console.log('...promise resolved...');
resolve(result);
}
});
console.log('...second layer of select successful, returning data for ' + record.length + ' records...');
return record;
});
}
This query works in another part of the app so I tried to just copy it over, but I am not getting any records returned even though I know there should be with the data I am sending over.
site_id is just a string that would look like ksdlfnsdlfu893hdsvSFJSDgfsdk. The record_name is also just a string that could really be anything but it is previously filtered so no spaces or special characters, most are something along these lines this-is-the-name.
With the names coming through there should be at least one found record for each, but I am getting nothing returned. I just cannot wrap my head around using mongo for these basic tasks, if anyone can help it would be greatly appreciated.
I am just using nodeJS and connecting to mongoDB, there is no express or mongoose or anything like that.
The problem here is that you are mixing callback and promises for async code handling. When you call:
var record = db.collection('' + site_id + '_campaigns').find({name: record_name}).toArray(function(err, result) {
You are passing in a callback function, which will receive the resulting array of mongo records in a parameter called result, but then assigning the immediate returned value to a variable called 'record', which is not going to contain anything.
Here is a cleaned up version of your function.
function hit_the_db(db, site_id, record_name, callback) {
// Find all records matching 'record_name'
db.collection(site_id + 'test_campaigns').find({ name: record_name }).toArray(function(err, results) {
// matching records are now stored in 'results'
if (err) {
console.log('err:', err);
}
return callback(err, results);
});
}
Here is optional code for testing the above function.
// This is called to generate test data
function insert_test_records_callback(db, site_id, record_name, insert_count, callback) {
const testRecords = [];
for (let i = 0; i < insert_count; ++i) {
testRecords.push({name: record_name, val: i});
}
db.collection(site_id + 'test_campaigns').insertMany(testRecords, function(err, result) {
return callback(err);
});
}
// This cleans up by deleting all test records.
function delete_test_records_callback(db, site_id, record_name, callback) {
db.collection(site_id + 'test_campaigns').deleteMany({name: record_name}, function(err, result) {
return callback(err);
});
}
// Test function to insert, query, clean up test records.
function test_callback(db) {
const site_id = 'ksdlfnsdlfu893hdsvSFJSDgfsdk';
const test_record_name = 'test_record_callback';
// First call the insert function
insert_test_records_callback(db, site_id, test_record_name, 3, function(err) {
// Once execution reaches here, insertion has completed.
if (err) {
console.log(err);
return;
}
// Do the query function
hit_the_db(db, site_id, test_record_name, function(err, records) {
// The query function has now completed
console.log('hit_the_db - err:', err);
console.log('hit_the_db - records:', records);
delete_test_records_callback(db, site_id, test_record_name, function(err, records) {
console.log('cleaned up test records.');
});
});
});
}
Output:
hit_the_db - err: null
hit_the_db - records: [ { _id: 5efe09084d078f4b7952dea8,
name: 'test_record_callback',
val: 0 },
{ _id: 5efe09084d078f4b7952dea9,
name: 'test_record_callback',
val: 1 },
{ _id: 5efe09084d078f4b7952deaa,
name: 'test_record_callback',
val: 2 } ]
cleaned up test records.

Mongo DB concurrency issue with findOne and updateOne

I am having an issue with concurrent requests that are updating the same document. I'm not using findAndModify() because I need to access the current state of the document to make the update which I don't see supported with findAndModify(). I also would like to avoid using db.fsyncLock() since that locks the entire database and I only need to lock one document in one collection.
First I use findOne() to get a document, then I use the updateOne() in the callback of findOne() to update the same document. When I queue up a bunch of actions and run them all at once I believe they are all accessing the same state when they call findOne() instead of waiting for the updateOne() to complete from the previous action.
How should I handle this?
mongoDBPromise.then((db)=> {
db.collection("notes").findOne(
{path: noteId},
(err, result)=> {
if (err) {
console.log(err);
return;
}
if (!result.UndoableNoteList.future.length) {
console.log("Nothing to redo");
return;
}
let past = result.UndoableNoteList.past.concat(Object.assign({},result.UndoableNoteList.present));
let present = Object.assign({},result.UndoableNoteList.future[0]);
let future = result.UndoableNoteList.future.slice(1, result.UndoableNoteList.future.length);
db.collection("notes").updateOne(
{path: noteId},
{
$set: {
UndoableNoteList: {
past: past,
present: present,
future:future
}
}
},
(err, result)=> {
if (err) {
console.log(err);
return;
}
}
)
}
);
});
As updateOne() is an async call, findOne() won't wait for it to complete and hence there can be situations where the same document is updated simultaneously, which won't be allowed in mongo.
I think updateOne() is not necessary in this case. Note that you have already found the right instance of the document which needs to be updated in findOne() query. Now, you can update that instance and save that document without doing updateOne(). I think the problem can be avoided this way:
mongoDBPromise.then((db)=> {
db.collection("notes").findOne(
{path: noteId},
(err, result)=> {
if (err) {
console.log(err);
return;
}
if (!result.UndoableNoteList.future.length) {
console.log("Nothing to redo");
return;
}
let past = result.UndoableNoteList.past.concat(Object.assign({},result.UndoableNoteList.present));
let present = Object.assign({},result.UndoableNoteList.future[0]);
let future = result.UndoableNoteList.future.slice(1, result.UndoableNoteList.future.length);
result.UndoableNoteList.past = past;
result.UndoableNoteList.present = present;
result.UndoableNoteList.future = future;
//save the document here and return
}
);
});
Hope this answer helps you!
I was not able to find a way to sequentially run the queries using purely mongodb functions. I've written some node.js logic that blocks mongodb queries from running on the same document and adds those queries to a queue. Here's what the code currently looks like.
The Websocket Undo Listener
module.exports = (noteId, wsHelper, noteWebSocket) => {
wsHelper.addMessageListener((msg, ws)=> {
if (msg.type === "UNDO") {
noteWebSocket.broadcast(msg, noteWebSocket.getOtherClientsInPath(noteId, wsHelper));
noteWebSocket.saveUndo(noteId);
}
});
};
The saveUndo function called from the listener
saveUndo(noteId) {
this.addToActionQueue(noteId, {payload: noteId, type: "UNDO"});
this.getNoteByIdAndProcessQueue(noteId);
}
The getNoteByIdAndProcessQueue function called from saveUndo
getNoteByIdAndProcessQueue(noteId) {
if (this.isProcessing[noteId])return;
this.isProcessing[noteId] = true;
mongoDBPromise.then((db)=> {
db.collection("notes").findOne(
{path: noteId},
(err, result)=> {
if (err) {
this.isProcessing[noteId] = false;
this.getNoteByIdAndProcessQueue(noteId);
return;
}
this.processQueueForNoteId(noteId, result.UndoableNoteList);
});
});
}
The processQueueForNoteId function
processQueueForNoteId(noteId, UndoableNoteList) {
this.actionQueue[noteId].forEach((action)=> {
if (action.type === "UNDO") {
UndoableNoteList = this.undoNoteAction(UndoableNoteList);
} else if (action.type === "REDO") {
UndoableNoteList = this.redoNoteAction(UndoableNoteList);
} else if (action.type === "ADD_NOTE") {
UndoableNoteList = this.addNoteAction(UndoableNoteList, action.payload);
} else if (action.type === "REMOVE_NOTE") {
UndoableNoteList = this.removeNoteAction(UndoableNoteList, action.payload);
}
});
let actionsBeingSaved = this.actionQueue[noteId].concat();
this.actionQueue[noteId] = [];
mongoDBPromise.then((db)=> {
db.collection("notes").updateOne(
{path: noteId},
{
$set: {
UndoableNoteList: UndoableNoteList
}
},
(err, result)=> {
this.isProcessing[noteId] = false;
// If the update failed then try again
if (err) {
console.log("update error")
this.actionQueue[noteId] = actionsBeingSaved.concat(this.actionQueue[noteId]);
}
// if action were queued during save then save again
if (this.actionQueue[noteId].length) {
this.getNoteByIdAndProcessQueue(noteId);
}
}
)
});
}

Async function in nodejs loop

I'm having problem calling async function inside a while loop.
the problem is 'while' statement will end before its underlying function result appear and thats because it's async function.
the code is like below:
while (end < min) {
db.collection('products').count({
tags: {
$in: ['tech']
}
}, function(err, result) {
if (result) {
a = result;
}
});
max = min;
min = max - step;
myitems.push(a);
}
res.send(myitems);
and at the end i could not send the result because all of while iteration should finish before sending the final result.
how could i modify the code to solve such a problem?
thanks in advance
Without using third party libraries, here's a method of manually sequencing your async operations. Note, because this is async, you have to process the results inside of the next() function when you see that you are done iterating.
// assume that end, max, min and step are all defined and initialized before this
var results = [];
function next() {
if (end < min) {
// something seems missing from the code here because
// this db.collection() call is always the same
db.collection('products').count({tags: {$in: ['tech']}}, function(err, result) {
if (!err && result) {
results.push(result);
max = min;
min - max - step;
next();
} else {
// got an error or a missing result here, provide error response
console.log("db.collection() error or missing result");
}
}
} else {
// all operations are done now
// process the results array
res.send(results);
}
}
// launch the first iteration
next();
You could leverage a 3rd party library to do this as well (non working performQuery example using async):
function performQuery(range, callback) {
// the caller could pre calculate
// the range of products to retrieve
db.collection('products').count({
tags: {
$in: ['tech'],
// could have some sort of range query
$gte: range.min,
$lt: range.max
}
}, function(err, result) {
if (result) {
callback(result)
}
});
}
async.parallel([
performQuery.bind(null, {min: 0, max: 10}),
performQuery.bind(null, {min: 10, max: 20})
], function(err, results) {
res.send(results);
});

Node / Mongoose: res.send() erases previous Mongo update

I'm updating an object in an array with Mongoose. After it updates I'm firing res.send() in the callback.
If I don't fire the res.send() the update saves fine. But when I do res.send(), the entire object in the array is erased from Mongo.
landmarkSchema.findById(tileRes.worldID, function(err, lm) {
if (!lm){
console.log(err);
}
else if (req.user._id == lm.permissions.ownerID){
var min = tileRes.zooms[0];
var max = tileRes.zooms.slice(-1)[0];
if (lm.style.maps.localMapArray){
for (var i = 0; i < lm.style.maps.localMapArray.length; i++) { //better way to do this with mongo $set
if (lm.style.maps.localMapArray[i].map_marker_viewID == req.body.map_marker_viewID) {
lm.style.maps.localMapArray[i]['temp_upload_path'] = '';
lm.style.maps.localMapArray[i]['localMapID'] = tileRes.mapURL;
lm.style.maps.localMapArray[i]['localMapName'] = tileRes.worldID;
lm.markModified('style.maps.localMapArray');
lm.save(function(err, landmark) {
if (err){
console.log('error');
}
else {
console.log('updated map',landmark);
//res.status(200).send(landmark);
}
});
}
}
}
}
});
Is it a write issue where Mongo doesn't finish writing before res.send is fired?
I recommend start using async to iterate in these cases, of course you can do this without it, but you'll have to find a very good reason to avoid using it.
I don't see the rest of your method, but using async it should be similar to this:
async.each(lm.style.maps.localMapArray, function(localMap, callback) {
// Do whatever you like here
//....
// Call method that requires a callback and pass the loop callback
localMap.iUseACallbackAfterDoingMyStuff(callback);
}, function(err){
// now here the loop has ended
if( err ) {
// Something happened..
} else {
res.status(200).send(somethingToTheClient);
}
});
The code snippet is just for you to get the idea.

How can I use a cursor.forEach() in MongoDB using Node.js?

I have a huge collection of documents in my DB and I'm wondering how can I run through all the documents and update them, each document with a different value.
The answer depends on the driver you're using. All MongoDB drivers I know have cursor.forEach() implemented one way or another.
Here are some examples:
node-mongodb-native
collection.find(query).forEach(function(doc) {
// handle
}, function(err) {
// done or error
});
mongojs
db.collection.find(query).forEach(function(err, doc) {
// handle
});
monk
collection.find(query, { stream: true })
.each(function(doc){
// handle doc
})
.error(function(err){
// handle error
})
.success(function(){
// final callback
});
mongoose
collection.find(query).stream()
.on('data', function(doc){
// handle doc
})
.on('error', function(err){
// handle error
})
.on('end', function(){
// final callback
});
Updating documents inside of .forEach callback
The only problem with updating documents inside of .forEach callback is that you have no idea when all documents are updated.
To solve this problem you should use some asynchronous control flow solution. Here are some options:
async
promises (when.js, bluebird)
Here is an example of using async, using its queue feature:
var q = async.queue(function (doc, callback) {
// code for your update
collection.update({
_id: doc._id
}, {
$set: {hi: 'there'}
}, {
w: 1
}, callback);
}, Infinity);
var cursor = collection.find(query);
cursor.each(function(err, doc) {
if (err) throw err;
if (doc) q.push(doc); // dispatching doc to async.queue
});
q.drain = function() {
if (cursor.isClosed()) {
console.log('all items have been processed');
db.close();
}
}
Using the mongodb driver, and modern NodeJS with async/await, a good solution is to use next():
const collection = db.collection('things')
const cursor = collection.find({
bla: 42 // find all things where bla is 42
});
let document;
while ((document = await cursor.next())) {
await collection.findOneAndUpdate({
_id: document._id
}, {
$set: {
blu: 43
}
});
}
This results in only one document at a time being required in memory, as opposed to e.g. the accepted answer, where many documents get sucked into memory, before processing of the documents starts. In cases of "huge collections" (as per the question) this may be important.
If documents are large, this can be improved further by using a projection, so that only those fields of documents that are required are fetched from the database.
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) {
assert.equal(err, null);
console.log("Successfully connected to MongoDB.");
var query = {
"category_code": "biotech"
};
db.collection('companies').find(query).toArray(function(err, docs) {
assert.equal(err, null);
assert.notEqual(docs.length, 0);
docs.forEach(function(doc) {
console.log(doc.name + " is a " + doc.category_code + " company.");
});
db.close();
});
});
Notice that the call .toArray is making the application to fetch the entire dataset.
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) {
assert.equal(err, null);
console.log("Successfully connected to MongoDB.");
var query = {
"category_code": "biotech"
};
var cursor = db.collection('companies').find(query);
function(doc) {
cursor.forEach(
console.log(doc.name + " is a " + doc.category_code + " company.");
},
function(err) {
assert.equal(err, null);
return db.close();
}
);
});
Notice that the cursor returned by the find() is assigned to var cursor. With this approach, instead of fetching all data in memory and consuming data at once, we're streaming the data to our application. find() can create a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point of cursor is to describe our query. The 2nd parameter to cursor.forEach shows what to do when the driver gets exhausted or an error occurs.
In the initial version of the above code, it was toArray() which forced the database call. It meant we needed ALL the documents and wanted them to be in an array.
Also, MongoDB returns data in batch format. The image below shows, requests from cursors (from application) to MongoDB
forEach is better than toArray because we can process documents as they come in until we reach the end. Contrast it with toArray - where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it, if you can in your application.
None of the previous answers mentions batching the updates. That makes them extremely slow 🐌 - tens or hundreds of times slower than a solution using bulkWrite.
Let's say you want to double the value of a field in each document. Here's how to do that fast 💨 and with fixed memory consumption:
// Double the value of the 'foo' field in all documents
let bulkWrites = [];
const bulkDocumentsSize = 100; // how many documents to write at once
let i = 0;
db.collection.find({ ... }).forEach(doc => {
i++;
// Update the document...
doc.foo = doc.foo * 2;
// Add the update to an array of bulk operations to execute later
bulkWrites.push({
replaceOne: {
filter: { _id: doc._id },
replacement: doc,
},
});
// Update the documents and log progress every `bulkDocumentsSize` documents
if (i % bulkDocumentsSize === 0) {
db.collection.bulkWrite(bulkWrites);
bulkWrites = [];
print(`Updated ${i} documents`);
}
});
// Flush the last <100 bulk writes
db.collection.bulkWrite(bulkWrites);
And here is an example of using a Mongoose cursor async with promises:
new Promise(function (resolve, reject) {
collection.find(query).cursor()
.on('data', function(doc) {
// ...
})
.on('error', reject)
.on('end', resolve);
})
.then(function () {
// ...
});
Reference:
Mongoose cursors
Streams and promises
Leonid's answer is great, but I want to reinforce the importance of using async/promises and to give a different solution with a promises example.
The simplest solution to this problem is to loop forEach document and call an update. Usually, you don't need close the db connection after each request, but if you do need to close the connection, be careful. You must just close it if you are sure that all updates have finished executing.
A common mistake here is to call db.close() after all updates are dispatched without knowing if they have completed. If you do that, you'll get errors.
Wrong implementation:
collection.find(query).each(function(err, doc) {
if (err) throw err;
if (doc) {
collection.update(query, update, function(err, updated) {
// handle
});
}
else {
db.close(); // if there is any pending update, it will throw an error there
}
});
However, as db.close() is also an async operation (its signature have a callback option) you may be lucky and this code can finish without errors. It may work only when you need to update just a few docs in a small collection (so, don't try).
Correct solution:
As a solution with async was already proposed by Leonid, below follows a solution using Q promises.
var Q = require('q');
var client = require('mongodb').MongoClient;
var url = 'mongodb://localhost:27017/test';
client.connect(url, function(err, db) {
if (err) throw err;
var promises = [];
var query = {}; // select all docs
var collection = db.collection('demo');
var cursor = collection.find(query);
// read all docs
cursor.each(function(err, doc) {
if (err) throw err;
if (doc) {
// create a promise to update the doc
var query = doc;
var update = { $set: {hi: 'there'} };
var promise =
Q.npost(collection, 'update', [query, update])
.then(function(updated){
console.log('Updated: ' + updated);
});
promises.push(promise);
} else {
// close the connection after executing all promises
Q.all(promises)
.then(function() {
if (cursor.isClosed()) {
console.log('all items have been processed');
db.close();
}
})
.fail(console.error);
}
});
});
The node-mongodb-native now supports a endCallback parameter to cursor.forEach as for one to handle the event AFTER the whole iteration, refer to the official document for details http://mongodb.github.io/node-mongodb-native/2.2/api/Cursor.html#forEach.
Also note that .each is deprecated in the nodejs native driver now.
You can now use (in an async function, of course):
for await (let doc of collection.find(query)) {
await updateDoc(doc);
}
// all done
which nicely serializes all updates.
let's assume that we have the below MongoDB data in place.
Database name: users
Collection name: jobs
===========================
Documents
{ "_id" : ObjectId("1"), "job" : "Security", "name" : "Jack", "age" : 35 }
{ "_id" : ObjectId("2"), "job" : "Development", "name" : "Tito" }
{ "_id" : ObjectId("3"), "job" : "Design", "name" : "Ben", "age" : 45}
{ "_id" : ObjectId("4"), "job" : "Programming", "name" : "John", "age" : 25 }
{ "_id" : ObjectId("5"), "job" : "IT", "name" : "ricko", "age" : 45 }
==========================
This code:
var MongoClient = require('mongodb').MongoClient;
var dbURL = 'mongodb://localhost/users';
MongoClient.connect(dbURL, (err, db) => {
if (err) {
throw err;
} else {
console.log('Connection successful');
var dataBase = db.db();
// loop forEach
dataBase.collection('jobs').find().forEach(function(myDoc){
console.log('There is a job called :'+ myDoc.job +'in Database')})
});
I looked for a solution with good performance and I end up creating a mix of what I found which I think works good:
/**
* This method will read the documents from the cursor in batches and invoke the callback
* for each batch in parallel.
* IT IS VERY RECOMMENDED TO CREATE THE CURSOR TO AN OPTION OF BATCH SIZE THAT WILL MATCH
* THE VALUE OF batchSize. This way the performance benefits are maxed out since
* the mongo instance will send into our process memory the same number of documents
* that we handle in concurrent each time, so no memory space is wasted
* and also the memory usage is limited.
*
* Example of usage:
* const cursor = await collection.aggregate([
{...}, ...],
{
cursor: {batchSize: BATCH_SIZE} // Limiting memory use
});
DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc) => ...)
* #param cursor - A cursor to batch process on.
* We can get this from our collection.js API by either using aggregateCursor/findCursor
* #param batchSize - The batch size, should match the batchSize of the cursor option.
* #param callback - Callback that should be async, will be called in parallel for each batch.
* #return {Promise<void>}
*/
static async concurrentCursorBatchProcessing(cursor, batchSize, callback) {
let doc;
const docsBatch = [];
while ((doc = await cursor.next())) {
docsBatch.push(doc);
if (docsBatch.length >= batchSize) {
await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
return callback(currDoc);
});
// Emptying the batch array
docsBatch.splice(0, docsBatch.length);
}
}
// Checking if there is a last batch remaining since it was small than batchSize
if (docsBatch.length > 0) {
await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
return callback(currDoc);
});
}
}
An example of usage for reading many big documents and updating them:
const cursor = await collection.aggregate([
{
...
}
], {
cursor: {batchSize: BATCH_SIZE}, // Limiting memory use
allowDiskUse: true
});
const bulkUpdates = [];
await DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc: any) => {
const update: any = {
updateOne: {
filter: {
...
},
update: {
...
}
}
};
bulkUpdates.push(update);
// Updating if we read too many docs to clear space in memory
await this.bulkWriteIfNeeded(bulkUpdates, collection);
});
// Making sure we updated everything
await this.bulkWriteIfNeeded(bulkUpdates, collection, true);
...
private async bulkWriteParametersIfNeeded(
bulkUpdates: any[], collection: any,
forceUpdate = false, flushBatchSize) {
if (bulkUpdates.length >= flushBatchSize || forceUpdate) {
// concurrentPromiseChunked is a method that loops over an array in a concurrent way using lodash.chunk and Promise.map
await PromiseUtils.concurrentPromiseChunked(bulkUpsertParameters, (upsertChunk: any) => {
return techniquesParametersCollection.bulkWrite(upsertChunk);
});
// Emptying the array
bulkUpsertParameters.splice(0, bulkUpsertParameters.length);
}
}

Resources