NodeJS MongoDB avoid Cursor Timeout - node.js

I would like to loop throw all documents on a specific collection of my MongoDB. However every attempt I made failed due to the timeout of the cursor. Here is my code
let MongoClient = require('mongodb').MongoClient;
const url = "my connection URI"
let options = { socketTimeoutMS: 120000, connectTimeoutMS: 120000, keepAlive: 100, poolSize: 5 }
MongoClient.connect(url, options,
function(err, db) {
if (err) throw err
let dbo = db.db("notes")
let collection = dbo.collection("stats-network-consumption")
let stream = collection.find({}, { timeout: false }).stream()
stream.on("data", function(item) {
printTask(item)
})
stream.on('error', function (err) {
console.error(err)
})
stream.on("end", function() {
console.log("DONE!")
db.close()
})
})
The code above runs for about 15 seconds and retrieves between 6000 to 8000 documents and then throws the following error:
{ MongoError: cursor does not exist, was killed or timed out
at queryCallback (/Volumes/safezone/development/workspace-router/migration/node_modules/mongodb-core/lib/wireprotocol/2_6_support.js:136:23)
at /Volumes/safezone/development/workspace-router/migration/node_modules/mongodb-core/lib/connection/pool.js:541:18
at process._tickCallback (internal/process/next_tick.js:150:11)
name: 'MongoError',
message: 'cursor does not exist, was killed or timed out' }
I need to retrieve around 50000 documents so I will need to find a way to avoid the cursor timeout.
As seen on the code above, I've tried to increase the socketTimeoutMS and the connectTimeoutMS, which had no effect on the cursor timeout.
I also have tried to replace stream with a forEach and add .addCursorFlag('noCursorTimeout', true) which also did not help.
I've tried everything I found about mongodb, I did not tried mongoose or alternatives because they use schemas and I'll later have to update the current type of an attribute (which can be tricky with the mongoose schemas).

Having a cursor with no timeout is generally not recommended.
The reason is, the cursor won't ever be closed by the server, so if your app crashed and you restart it, it will open another no timeout cursor on the server. Recycle your app often enough, and those will add up.
No timeout cursor on a sharded cluster would also prevent chunk migration.
If you need to retrieve big results, the cursor should not timeout since the results will be sent in batches, and the cursor would be reused to get the next batch.
The standard cursor timeout is 10 minutes, so it is possible to lose the cursor if you need more than 10 minutes to process a batch.
In your code example, your use of stream() might be interfering with your intent. Try using each() (example here) on the cursor instead.
If you need to monitor a collection for changes, you might want to take a look at Change Streams which is a new feature in MongoDB 3.6.
For example, your code may be able to be modified like:
let collection = dbo.collection("stats-network-consumption")
let stream = collection.watch()
document = next(stream)
Note that to enable change stream support, the driver you're using must support MongoDB 3.6 features and the watch() method. See Driver Compatibility Page for details.

Related

NodeJS can't get memcached values when highloaded

I have an application on NodeJS that uses Cluster, WS, and memcached-client to manage two memcached-servers
During normal times, it works like a charm
But during high load, my application stops working and fetches data from memcached-servers
That is, the logs inside client.get callback do not work, and are not written to the console, when the load is high, therefore the client does not receive its cached value (although it is present on the memcached server and sometimes even with high load it works fine). For a while it will look like it's dead and not doing anything
getValue = function(key, callback){
console.log(`Calculated server for choose: ${strategy(key, client.servers.length)}`) // works with highload
console.log(`Try to get from cache by key: ${key}.`); // works with highload
client.get( key, function(err, data) {
const isError = err || !data // doesn't work with highload
console.log('Data from cache is: ', data) // callback will be never executed
if (!isError) {
console.log(`Found data in cache key-value: ${key} - ${data}`);
}else{
console.log(`Not found value from cache by key: ${key}`);
}
const parsedData = isError ? null : JSON.parse(data.toString())
callback(isError, parsedData); // and this won't work also
});
}
And after some time, socket connection is simply closed (with 1000 code, no errors, looks like user just leaves out)
INFO [ProcessID-100930] Connection close [772003], type [ws], code [1000], message []
Then, after 5-10 seconds, all processes start working again as if nothing had happened and the memcached-client callback starts to execute correctly
I've been trying for so long to catch this moment and understand why this is happening, but I still don't understand. I have changed already several memcached clients(memjs now, memcached, mc) but still get the same behavior under high load
When receiving data from memcached-server, the callback simply does not work, and data from the memcached is not returned (although judging by the memcached logs, they were there at that moment)
Can someone suggest please?

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.
I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.
yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**
First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.
This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM

I want to query all the documents in a collection in a reactive way. The collection.find() method of the mongodb nodejs driver returns a cursor that fires events for each document found in the collection. So I made this:
function giant_query = (db) => {
var req = db.collection('mycollection').find({});
return Rx.Observable.merge(Rx.Observable.fromEvent(req, 'data'),
Rx.Observable.fromEvent(req, 'end'),
Rx.Observable.fromEvent(req, 'close'),
Rx.Observable.fromEvent(req, 'readable'));
}
It will do what I want: fire for each document, so I can treat then in a reactive way, like this:
Rx.Observable.of('').flatMap(giant_query).do(some_function).subscribe()
I could query the documents in packets of tens, but then I'd have to keep track of an index number for each time the observable stream is fired, and I'd have to make an observable loop which I do not know if it's possible or the right way to do it.
The problem with this cursor is that I don't think it does things in packets. It'll probably fire all the events in a short period of time, therefore flooding my RAM. Even if I buffer some events in packets using Observable's buffer, the events and events data (the documents) are going to be waiting on RAM to be manipulated.
What's the best way to deal with it n a reactive way?
I'm not an expert on mongodb, but based on the examples I've seen, this is a pattern I would try.
I've omitted the events other than data, since throttling that one seems to be the main concern.
var cursor = db.collection('mycollection').find({});
const cursorNext = new Rx.BehaviourSubject('next'); // signal first batch then wait
const nextBatch = () => {
if(cursor.hasNext()) {
cursorNext.next('next');
}
});
cursorNext
.switchMap(() => // wait for cursorNext to signal
Rx.Observable.fromPromise(cursor.next()) // get a single doc
.repeat() // get another
.takeWhile(() => cursor.hasNext() ) // stop taking if out of data
.take(batchSize) // until full batch
.toArray() // combine into a single emit
)
.map(docsBatch => {
// do something with the batch
// return docsBatch or modified doscBatch
})
... // other operators?
.subscribe(x => {
...
nextBatch();
});
I'm trying to put together a test of this Rx flow without mongodb, in the meantime this might give you some ideas.
You also might wanna check my solution without using of rxJS:
Mongoose Cursor: http bulk request from collection

MongoError: cursor is dead (mongo node driver)

I am using node-mongodb-native 2.0 http://mongodb.github.io/node-mongodb-native/2.0/
With the following Node.js code:
var MongoClient = require('mongodb').MongoClient;
var mongoUrl = 'mongodb://localhost/twitter';
MongoClient.connect(mongoUrl, function(err, db) {
if (err) return console.error(err);
var collection = db.collection('tweets');
collection.find().limit(1000).forEach(function(tweet) {
console.log(tweet.id);
}, function(err) {
if (err) console.error(err);
db.close();
});
});
When I set the limit to 1000 (collection.find().limit(1000)), I was able to retrieve first several hundreds records, but I got error message { [MongoError: cursor is dead] name: 'MongoError', message: 'cursor is dead' } later on (I got 1 million records in my collection). But the program runs OK we I specify 800 as limit. It's also OK to not specify any limit (just collection.find()), and the script just keeps going without any error (reading way more records than 1000).
What's wrong? How to solve? How to solve if I still want to use forEach on a cursor?
I have reproduced this issue with a sample data set of smaller documents. The telling part is likely this log line from MongoDB:
2014-10-21T18:50:32.548+0100 [conn50] query twitter.tweets planSummary: COLLSCAN cursorid:30362014860 ntoreturn:200000 ntoskip:0 nscanned:199728 nscannedObjects:199728 keyUpdates:0 numYields:0 locks(micros) r:120400 nreturned:199728 reslen:4194308 120ms
The key piece, I suspect is reslen:4194308 which looks suspiciously close to the default batch size of 4MiB. I've been in touch with the node.js driver developers, will let you know if this ends up as a bug (update: opened NODE-300, and confirmed fix in mongodb-core#1.0.4 as result).
In the meantime, I'd recommend using the workaround from the comments, namely using a projection to reduce the amount of data in the results and sidestepping the issue.
Updated Resolution: If you are seeing this (or a similar) issue, then please update your mongodb-core version and retry - the new version no longer errors and runs quite a bit faster also. I did this by removing my node_modules and re-running npm install for my test app.

Should MongooseJS be emitting events on replica set disconnection?

With a single server setup, I receive events from the driver.
mongoose.connect('mongodb://localhost/mydb');
mongoose.connection.on('disconnected', function() {...});
mongoose.connection.on('error', function(err) {...});
When using a replica set (mongoose.connect('mongodb://localhost:27017/mydb,mongodb://localhost:27018/mydb');), shutting down all connected set members doesn't trigger those same events.
I'm not very familiar with the internals of the native driver and I'm wondering if this is a bug or if I need to manually detect this condition.
I'm using Mongoose 3.6.17 (mongodb driver 1.3.18)
Sans mongoose, I tried this with the same results (no events from a replica set).
require('mongodb').MongoClient.connect("mongodb://localhost:27017,localhost:27018/mydb", function(err, db) {
if (db) {
db.on('disconnected', function() {
console.log('disconnected');
}).on('error', function(err) {
console.log('error');
});
}
});
I've been having similar problems with Mongoose, asked on SO also. More recently, I've found this issue on Mongoose GitHub repository which led to this issue on the driver repository.
The Mongo driver wasn't emitting any of these events more than once, and today this has been fixed for single connections on v1.3.19.
It seems that it's a "won't fix" for now.
I ended up doing the following:
I set auto_reconnect=true
Until the application has connected to the database for the first time, i disconnect and reconnect. if i don't disconnect and reconnect, any queued queries won't run. after a connection has been established at least once, those queued queries do complete and then...
for single connections:
1. forked mongoose (to use mongodb to 1.3.19) so errors get triggered more than once.
2. catch the connection error and make the app aware of the disconnection, retrying until i give up and panic or the app is reconnected. how that's done is by pinging the server every x milliseconds with a command that will not queue:
var autoReconnect = mongoose.connection.db.serverConfig.isAutoReconnect;
mongoose.connection.db.serverConfig.isAutoReconnect = function(){return false;};
mongoose.connection.db.executeDbCommand({ping:1}, {failFast: true}, function(err) {
if (!err) {
// we've reconnected.
}
});
mongoose.connection.db.serverConfig.isAutoReconnect = autoReconnect;
for a replica set, i ended up polling the mongoose connection with the above ping every x milliseconds until i detect an error, in which case i set my local app state to disconnected and enter the reconnect poll loop above (2.).
here's a gist with the relevant bits. https://gist.github.com/jsas/6299412
This is a nasty inconsistency/oversight in mongoose.
Especially when developing a microservice where you're using a single server setup for development and replica set in production.
This is the way I ended up accurately tracking the status of my mongoose connection.
let alive = false;
function updateAlive() {
return mongoose.connection
&& mongoose.connection.readyState === mongoose.STATES.connected
// This is necessary because mongoose treats a dead replica set as still "connected".
&& mongoose.connection.db.topology.connections().length > 0;
}
mongoose.connection.on('connected', () => {
updateAlive();
// I think '.topology' is available on even single server connections.
// The events just won't be emitted.
mongoose.connection.db.topology.on('joined', () => updateAlive());
mongoose.connection.db.topology.on('left', () => updateAlive());
});
mongoose.connection.on('disconnected', () => {
updateAlive();
});

Resources