Node.js: Mongoose initializeUnorderedBulk returning null - node.js

a while ago I managed to write a method to bulk upsert many information into my database. Now what I am trying to do is a method to clean old records on the same database and table.
raidSchema.statics.bulkUpsert = function (raids, callback) {
var bulk = Raid.collection.initializeUnorderedBulkOp();
for (var i = 0; i < raids.length; i++) {
var raid = raids[i];
var date = new Date();
bulk.find({id: raid.id, hash: raid.hash}).upsert().update({
$setOnInsert: {
...
},
$set: {
...
}
});
}
bulk.execute(callback);
};
This works perfectly. Then I did this, in hope it would clean the old records that I don't need anymore:
raidSchema.statics.cleanOldRaids = function (callback) {
var date = new Date();
date.setMinutes(date.getMinutes() - 30);
var bulk = Raid.collection.initializeUnorderedBulkOp();
bulk.find({$or: [ { maxHealth: {$lte: 0} }, { isComplete: true }, {updatedOn: {$lte: date.getTime()}} ] }).remove();
bulk.execute(callback);
};
And I am running this methods with this script, which tries to run it every 30 minutes:
var Raid = require('../models/raid');
var async = require('async');
var cleanInterval = 1000 * 60 * 30;
var cleanRaids = function () {
console.log('cleanRaids: Starting cleaning');
async.series([
function (callback) {
Raid.cleanOldRaids();
callback(null, 'All servers');
}],
function (err, results) {
if (err) throw err;
console.log("cleanRaids: Done cleaning (" + results.join() + ")");
setTimeout(cleanRaids, cleanInterval);
})
};
cleanRaids();
but right after I run my server it crashes saying that it cannot read property find of undefined:
.../models/raid.js:104
bulk.find({$or: [ { maxHealth: {$lte: 0} }, { isComplete: true }, {updatedO
^
TypeError: Cannot read property 'find' of undefined
I am completely lost since it works perfectly with the bulkUpsert method, which is run by a very similar code.
Anyone has any idea as to why this might be happening?
Thanks a lot.

The problem here is that mongoose has not connected to the database yet, and therefore has no handle to the underlying driver object you are accessing via the .collection accessor.
The mongoose methods themselves perform a little "magic", by essentially queuing all operations until the database connection is actually made. i.e:
Model.find().exec(function(err,docs) { }); // <-- callback queues until connection is ready
However if no connection is present, native methods will not return a collection object:
Model.collection.find({},function(err,docs) { }); <-- collection is undefined
The bulk methods just return a structure that has not executed, so the error does not present until you try to call a method on that stucture.
The fix is easy, just wait for the connection before executing any code:
mongoose.connection.on("open",function(err) {
// body of program in here
});
So though "mongoose methods" do their own magic to "hide this away", this is needed when calling native methods. The only other way you get away with it is when you are absolutely sure that one of the "mongoose methods" has actually fired already, and that a connection has been made.
Better to be safe than sorry, so it is a wise practice to put the body of your main program initialize and methods within such a block as above.

Related

Express/Mongoose use object function as callback doesn't work

I want to call function from my object in express route. This function should call mongoose query, then run next, next etc. - all needed operations.
Here is my example route:
var MailSender = require('../../libs/mailer');
router.get('/mailer/test', function (req, res, next) {
MailSender.getPending();
});
And mailer file:
(here include all required)
module.exports = {
currentMails : {},
getPending : function() {
MailObj.find()
.limit(10)
.exec(this.blockPending);
},
blockPending : function(err, mail) {
currentMails = {};
mail.forEach(function(data) {
let mailId = mongoose.Types.ObjectId(data._id);
currentMails[mailId] = data;
});
MailObj.update({ _id: { $in: Object.keys(currentMails) } }, { block: 1 }, {multi: true}, function() {
// Doesn't work
this.myNextFunc();
});
},
myNextFunc : function() {
console.log("Yep!");
}
}
getPending - it works great and call blackPending with query results.
blockPending - works greats, I can prepare ids and update records
But... myNextFunc() doesn't work and I can't call any object function from this scope (console says that they are undefined). I know, that I make something wrong but... what?
I'ld like to encapsule related functions in such objects and run inside as callbacks. What am I doing wrong?
As far as you are using Mongoose, why don't you take profit of it, and you update each mail in the loop?? It is less efficient, but maybe as a first approach it deserves:
var numUpdates = 0;
mail.forEach(function(data) {
data.block = 1;
data.save(function(err, mailSaved) {
//check error
if(numUpdates ++ >= mail.length) {
this.myNextFunc();
}
})
});

NodeJS + MongoJS: Nested Callbacks Issue

I'm still a n00b with NodeJS so this question may be somewhat elementary.
I'm using MongoJS to read two logically related collections. The first find() returns a value that I pass to the second find() to get the information I need.
I've tried several strategies, the last one (snippet #1) being a class that I export.
Before that I just had a function that did a return, returning the desired value, i.e., "config[0]".
In this code all I did was set the "sapConfig" attribute to the word "test", but when I execute this code the value of "sapConfig" is always "null" after I call the "get_config()" method and - strangest of all - the reference to "this.sapConfig = 'test'" generates an error, i.e., "Cannot set property 'sapConfig' of undefined".
When I had the code just as a simple function with a return statement (snippet #2), no errors were generated but the value returned is always "undefined" although the console.log() statements show that the value of the variable being returned has the desired value. What gives?
Code Snippet #1: Returns Object
"use strict";
var mongojs = require('mongojs'); // MongoDB API wrapper
module.exports = function(regKey) {
this.regKey = regKey;
this.sapConfig = null;
this.get_config = function() {
// Read SAP connection information from our MONGO db
var db = mongojs('mongodb://localhost/MIM', ['Configurations','Registrations']);
db.Registrations.find({ key: this.regKey }, function(err1, registration){
console.log('Reg.find()');
console.log(registration[0]);
db.Configurations.find({ type: registration[0].type }, function(err2, config){
console.log('Config.find()');
console.log('config=' + config[0].user);
this.sapConfig = 'test';
});
});
}
this.get_result = function() {
return this.sapConfig;
}
}
Again, the code in snippet #1, when I make a call to "get_config()", results in an error when it executes the line "this.sapConfig = 'test'".
However, after this error I can execute "obj.get_result()" and I get the value to which it was initialized, i.e., null. In other words, that same code doesn't generate an error saying that the "this" is undefined as .in the "get_config()" method
Code Snippet #2: Using the "return" statement
"use strict";
var mongojs = require('mongojs'); // MongoDB API wrapper
module.exports = function(regKey) {
// Read SAP connection information from our MONGO db
var db = mongojs('mongodb://localhost/MIM', ['Configurations','Registrations']);
db.Registrations.find({ key: regKey }, function(err1, registration){
console.log('Reg.find()');
console.log(registration[0]);
db.Configurations.find({ type: registration[0].type }, function(err2, config){
console.log('Config.find()');
console.log('config=' + config[0].user);
return config[0].user;
});
});
}
When I receive the return value and inspect it, it's "undefined". For example, at the Node CL I issue the following commands:
var config = require('./config') // The name of the module above
> var k = config('2eac44bc-232d-4667-bd24-18e71879f18c')
undefined <-- this is from MongoJS; it's fine
> Reg.find() <-- debug statement in my function
{ _id: 589e2bf64b0e89f233da8fbb,
key: '2eac44bc-232d-4667-bd24-18e71879f18c',
type: 'TEST' }
Config.find()
config=MST0025
> k <-- this should have the value of "config[0]"
undefined
You can see that the queries were successful but the value of "k" is "undefined". What's going on here?
I don't care which approach I use I just need one of them to work.
Thanks in advance!
this.sapConfig is not accessible. Thats because this refers to within the current function. What I like todo, is have a variable that refers to the function instance that you know sapConfig is located in.
Ex:
function Foo() {
var self = this;
this.test = "I am test";
var bar = function(){
return function(){
console.log(this.test); //outputs undefined (because this refers to the current function scope)
console.log(self.test); //outputs "I am test";
}
}
}
Here is your first code snippit with my example implemented:
"use strict";
var mongojs = require('mongojs'); // MongoDB API wrapper
module.exports = function(regKey) {
var self = this;
this.regKey = regKey;
this.sapConfig = null;
this.get_config = function() {
// Read SAP connection information from our MONGO db
var db = mongojs('mongodb://localhost/MIM', ['Configurations', 'Registrations']);
db.Registrations.find({ key: this.regKey }, function(err1, registration) {
console.log('Reg.find()');
console.log(registration[0]);
db.Configurations.find({ type: registration[0].type }, function(err2, config) {
console.log('Config.find()');
console.log('config=' + config[0].user);
self.sapConfig = 'test';
});
});
}
this.get_result = function() {
return self.sapConfig;
}
}
For your second snippet. You are trying to return a value from within your nested callback. Since nested functions are asyncronous, you cannot do that.
Here is how I like to return values from nested callbacks:
Ex2:
//Function example
var functionWithNested = function(done) {
//Notice the done param.
// It is going to be a function that takes the finished data once all our nested functions are done.
function() {
//Do things
function() {
//do more things
done("resultHere"); //finished. pass back the result.
}();//end of 2nd nested function
}(); //end of 1st nested function
};
//Calling the function
functionWithNested(function(result) {
//Callback
console.log(result); //resultHere
})
Here is your code using that example:
"use strict";
var mongojs = require('mongojs'); // MongoDB API wrapper
module.exports = function(regKey, done) {
// Read SAP connection information from our MONGO db
var db = mongojs('mongodb://localhost/MIM', ['Configurations', 'Registrations']);
db.Registrations.find({ key: regKey }, function(err1, registration) {
console.log('Reg.find()');
console.log(registration[0]);
db.Configurations.find({ type: registration[0].type }, function(err2, config) {
console.log('Config.find()');
console.log('config=' + config[0].user);
done(config[0].user);
});
});
}
//Then wherever you call the above function use this format
// if config is the name of the function export above...
new Config().(regKey, function(result){
console.log(result); //config[0].user value
})
Lots and lots of code, but I hope you were able to follow it. Let me know if you have any more questions! Cheers.

call db.loadServerScripts on connection startup

I have some server-side helper functions in system.js collection, which are then used in node.js. However sometimes, they are undefined.
Here is the scenario:
I load these functions once, on server start:
db.eval('db.loadServerScripts()', function(err, result) { ... });
So this is called only once on start, not for every request.
From now on I can call e.g.:
db.eval('getNextSequence(\'test\')', function(err, seq){});
But sometimes I get the error, that getNextSequence is undefined. I suspect those functions to exist only in current connection scope. So maybe when node receives new connection, the functions are not set.
Is there any way to use those functions in node.js, but to have them reliably available always?
Example scenario:
//1./ this function is stored in system.js
getNextSequence: function(name)
{
var ret = db.counters.findAndModify({
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true,
upsert: true
});
return ret ? ret.seq : null;
}
//2./ this is called on nodejs server startup (once for server lifetime)
var mongo = require('mongoskin');
var db = mongo.db("mongodb://localhost:27017/mydb", {native_parser:true});
//...
db.eval('db.loadServerScripts()', function(err, result) {
//...crash if failed
});
//3./ this is used in node.js code, on request processing:
db.eval('getNextSequence(\'someNamespace\')', function(err, seq){
// int seq is converted to string slug
// a new entity with slugId is saved to collection
});

How can I use a cursor.forEach() in MongoDB using Node.js?

I have a huge collection of documents in my DB and I'm wondering how can I run through all the documents and update them, each document with a different value.
The answer depends on the driver you're using. All MongoDB drivers I know have cursor.forEach() implemented one way or another.
Here are some examples:
node-mongodb-native
collection.find(query).forEach(function(doc) {
// handle
}, function(err) {
// done or error
});
mongojs
db.collection.find(query).forEach(function(err, doc) {
// handle
});
monk
collection.find(query, { stream: true })
.each(function(doc){
// handle doc
})
.error(function(err){
// handle error
})
.success(function(){
// final callback
});
mongoose
collection.find(query).stream()
.on('data', function(doc){
// handle doc
})
.on('error', function(err){
// handle error
})
.on('end', function(){
// final callback
});
Updating documents inside of .forEach callback
The only problem with updating documents inside of .forEach callback is that you have no idea when all documents are updated.
To solve this problem you should use some asynchronous control flow solution. Here are some options:
async
promises (when.js, bluebird)
Here is an example of using async, using its queue feature:
var q = async.queue(function (doc, callback) {
// code for your update
collection.update({
_id: doc._id
}, {
$set: {hi: 'there'}
}, {
w: 1
}, callback);
}, Infinity);
var cursor = collection.find(query);
cursor.each(function(err, doc) {
if (err) throw err;
if (doc) q.push(doc); // dispatching doc to async.queue
});
q.drain = function() {
if (cursor.isClosed()) {
console.log('all items have been processed');
db.close();
}
}
Using the mongodb driver, and modern NodeJS with async/await, a good solution is to use next():
const collection = db.collection('things')
const cursor = collection.find({
bla: 42 // find all things where bla is 42
});
let document;
while ((document = await cursor.next())) {
await collection.findOneAndUpdate({
_id: document._id
}, {
$set: {
blu: 43
}
});
}
This results in only one document at a time being required in memory, as opposed to e.g. the accepted answer, where many documents get sucked into memory, before processing of the documents starts. In cases of "huge collections" (as per the question) this may be important.
If documents are large, this can be improved further by using a projection, so that only those fields of documents that are required are fetched from the database.
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) {
assert.equal(err, null);
console.log("Successfully connected to MongoDB.");
var query = {
"category_code": "biotech"
};
db.collection('companies').find(query).toArray(function(err, docs) {
assert.equal(err, null);
assert.notEqual(docs.length, 0);
docs.forEach(function(doc) {
console.log(doc.name + " is a " + doc.category_code + " company.");
});
db.close();
});
});
Notice that the call .toArray is making the application to fetch the entire dataset.
var MongoClient = require('mongodb').MongoClient,
assert = require('assert');
MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) {
assert.equal(err, null);
console.log("Successfully connected to MongoDB.");
var query = {
"category_code": "biotech"
};
var cursor = db.collection('companies').find(query);
function(doc) {
cursor.forEach(
console.log(doc.name + " is a " + doc.category_code + " company.");
},
function(err) {
assert.equal(err, null);
return db.close();
}
);
});
Notice that the cursor returned by the find() is assigned to var cursor. With this approach, instead of fetching all data in memory and consuming data at once, we're streaming the data to our application. find() can create a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point of cursor is to describe our query. The 2nd parameter to cursor.forEach shows what to do when the driver gets exhausted or an error occurs.
In the initial version of the above code, it was toArray() which forced the database call. It meant we needed ALL the documents and wanted them to be in an array.
Also, MongoDB returns data in batch format. The image below shows, requests from cursors (from application) to MongoDB
forEach is better than toArray because we can process documents as they come in until we reach the end. Contrast it with toArray - where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it, if you can in your application.
None of the previous answers mentions batching the updates. That makes them extremely slow 🐌 - tens or hundreds of times slower than a solution using bulkWrite.
Let's say you want to double the value of a field in each document. Here's how to do that fast 💨 and with fixed memory consumption:
// Double the value of the 'foo' field in all documents
let bulkWrites = [];
const bulkDocumentsSize = 100; // how many documents to write at once
let i = 0;
db.collection.find({ ... }).forEach(doc => {
i++;
// Update the document...
doc.foo = doc.foo * 2;
// Add the update to an array of bulk operations to execute later
bulkWrites.push({
replaceOne: {
filter: { _id: doc._id },
replacement: doc,
},
});
// Update the documents and log progress every `bulkDocumentsSize` documents
if (i % bulkDocumentsSize === 0) {
db.collection.bulkWrite(bulkWrites);
bulkWrites = [];
print(`Updated ${i} documents`);
}
});
// Flush the last <100 bulk writes
db.collection.bulkWrite(bulkWrites);
And here is an example of using a Mongoose cursor async with promises:
new Promise(function (resolve, reject) {
collection.find(query).cursor()
.on('data', function(doc) {
// ...
})
.on('error', reject)
.on('end', resolve);
})
.then(function () {
// ...
});
Reference:
Mongoose cursors
Streams and promises
Leonid's answer is great, but I want to reinforce the importance of using async/promises and to give a different solution with a promises example.
The simplest solution to this problem is to loop forEach document and call an update. Usually, you don't need close the db connection after each request, but if you do need to close the connection, be careful. You must just close it if you are sure that all updates have finished executing.
A common mistake here is to call db.close() after all updates are dispatched without knowing if they have completed. If you do that, you'll get errors.
Wrong implementation:
collection.find(query).each(function(err, doc) {
if (err) throw err;
if (doc) {
collection.update(query, update, function(err, updated) {
// handle
});
}
else {
db.close(); // if there is any pending update, it will throw an error there
}
});
However, as db.close() is also an async operation (its signature have a callback option) you may be lucky and this code can finish without errors. It may work only when you need to update just a few docs in a small collection (so, don't try).
Correct solution:
As a solution with async was already proposed by Leonid, below follows a solution using Q promises.
var Q = require('q');
var client = require('mongodb').MongoClient;
var url = 'mongodb://localhost:27017/test';
client.connect(url, function(err, db) {
if (err) throw err;
var promises = [];
var query = {}; // select all docs
var collection = db.collection('demo');
var cursor = collection.find(query);
// read all docs
cursor.each(function(err, doc) {
if (err) throw err;
if (doc) {
// create a promise to update the doc
var query = doc;
var update = { $set: {hi: 'there'} };
var promise =
Q.npost(collection, 'update', [query, update])
.then(function(updated){
console.log('Updated: ' + updated);
});
promises.push(promise);
} else {
// close the connection after executing all promises
Q.all(promises)
.then(function() {
if (cursor.isClosed()) {
console.log('all items have been processed');
db.close();
}
})
.fail(console.error);
}
});
});
The node-mongodb-native now supports a endCallback parameter to cursor.forEach as for one to handle the event AFTER the whole iteration, refer to the official document for details http://mongodb.github.io/node-mongodb-native/2.2/api/Cursor.html#forEach.
Also note that .each is deprecated in the nodejs native driver now.
You can now use (in an async function, of course):
for await (let doc of collection.find(query)) {
await updateDoc(doc);
}
// all done
which nicely serializes all updates.
let's assume that we have the below MongoDB data in place.
Database name: users
Collection name: jobs
===========================
Documents
{ "_id" : ObjectId("1"), "job" : "Security", "name" : "Jack", "age" : 35 }
{ "_id" : ObjectId("2"), "job" : "Development", "name" : "Tito" }
{ "_id" : ObjectId("3"), "job" : "Design", "name" : "Ben", "age" : 45}
{ "_id" : ObjectId("4"), "job" : "Programming", "name" : "John", "age" : 25 }
{ "_id" : ObjectId("5"), "job" : "IT", "name" : "ricko", "age" : 45 }
==========================
This code:
var MongoClient = require('mongodb').MongoClient;
var dbURL = 'mongodb://localhost/users';
MongoClient.connect(dbURL, (err, db) => {
if (err) {
throw err;
} else {
console.log('Connection successful');
var dataBase = db.db();
// loop forEach
dataBase.collection('jobs').find().forEach(function(myDoc){
console.log('There is a job called :'+ myDoc.job +'in Database')})
});
I looked for a solution with good performance and I end up creating a mix of what I found which I think works good:
/**
* This method will read the documents from the cursor in batches and invoke the callback
* for each batch in parallel.
* IT IS VERY RECOMMENDED TO CREATE THE CURSOR TO AN OPTION OF BATCH SIZE THAT WILL MATCH
* THE VALUE OF batchSize. This way the performance benefits are maxed out since
* the mongo instance will send into our process memory the same number of documents
* that we handle in concurrent each time, so no memory space is wasted
* and also the memory usage is limited.
*
* Example of usage:
* const cursor = await collection.aggregate([
{...}, ...],
{
cursor: {batchSize: BATCH_SIZE} // Limiting memory use
});
DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc) => ...)
* #param cursor - A cursor to batch process on.
* We can get this from our collection.js API by either using aggregateCursor/findCursor
* #param batchSize - The batch size, should match the batchSize of the cursor option.
* #param callback - Callback that should be async, will be called in parallel for each batch.
* #return {Promise<void>}
*/
static async concurrentCursorBatchProcessing(cursor, batchSize, callback) {
let doc;
const docsBatch = [];
while ((doc = await cursor.next())) {
docsBatch.push(doc);
if (docsBatch.length >= batchSize) {
await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
return callback(currDoc);
});
// Emptying the batch array
docsBatch.splice(0, docsBatch.length);
}
}
// Checking if there is a last batch remaining since it was small than batchSize
if (docsBatch.length > 0) {
await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
return callback(currDoc);
});
}
}
An example of usage for reading many big documents and updating them:
const cursor = await collection.aggregate([
{
...
}
], {
cursor: {batchSize: BATCH_SIZE}, // Limiting memory use
allowDiskUse: true
});
const bulkUpdates = [];
await DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc: any) => {
const update: any = {
updateOne: {
filter: {
...
},
update: {
...
}
}
};
bulkUpdates.push(update);
// Updating if we read too many docs to clear space in memory
await this.bulkWriteIfNeeded(bulkUpdates, collection);
});
// Making sure we updated everything
await this.bulkWriteIfNeeded(bulkUpdates, collection, true);
...
private async bulkWriteParametersIfNeeded(
bulkUpdates: any[], collection: any,
forceUpdate = false, flushBatchSize) {
if (bulkUpdates.length >= flushBatchSize || forceUpdate) {
// concurrentPromiseChunked is a method that loops over an array in a concurrent way using lodash.chunk and Promise.map
await PromiseUtils.concurrentPromiseChunked(bulkUpsertParameters, (upsertChunk: any) => {
return techniquesParametersCollection.bulkWrite(upsertChunk);
});
// Emptying the array
bulkUpsertParameters.splice(0, bulkUpsertParameters.length);
}
}

SailsJS + Waterline concurrent db requests with promises

I'm a bit confused about concurrency in SailsJS's waterline.
Currently I'm doing data retrieval like this;
var results = {};
// Get user by id 5
User.find('5', function(err, user) {
results.user = user;
// when it resolves, get messages
Message.find({userId: '5'}, function(err, messages) {
results.messages = messages;
// when message query resolves, get other stuff
OtherStuff.find({userId: '5'}, function(err, otherStuff) {
results.otherStuff = otherStuff;
res.view({results});
});
});
});
The problem is that the DB calls are not concurrent. Every request launches after previous one's promise has been fulfilled. I'd like to launch all requests at the same time and then see somehow if all promises are fulfilled and if so, proceed to pass results to the view.
How am I gonna achieve this concurrency with db requests?
Thanks!
Use async.auto. The async module is globalized in Sails:
async.auto({
user: function(cb) {
// Note--use findOne here, not find! "find" doesn't accept
// an ID argument, only an object.
User.findOne('5').exec(cb);
},
messages: function(cb) {
Message.find({userId: '5'}).exec(cb);
},
otherStuff: function(cb) {
OtherStuff.find({userId: '5'}).exec(cb);
}
},
// This will be called when all queries are complete, or immediately
// if any of them returns an error
function allDone (err, results) {
// If any of the queries returns an error,
// it'll populate the "err" var
if (err) {return res.serverError(err);}
// Otherwise "results" will be an object whose keys are
// "user", "messages" and "otherStuff", and whose values
// are the results of those queries
res.view(results);
}
);

Resources