Querying DB2 every 15 seconds causing memory leak in NodeJS - node.js

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.

I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.

yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**

First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.

This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

Related

Synchronize multiple requests to database in NestJS

in our NestJS application we are using TypeORM as ORM to work with db tables and typeorm-transactional-cls-hooked library.
now we have problem with synchronization of requests which are read and modifying database at same time.
Sample:
#Transactional()
async doMagicAndIncreaseCount (id) {
const await { currentCount } = this.fooRepository.findOne(id)
// do some stuff where I receive new count which I need add to current, for instance 10
const newCount = currentCount + 10
this.fooRepository.update(id, { currentCount: newCount })
}
When we executed this operation from the frontend multiple times at the same time, the final count is wrong. The first transaction read currentCount and then start computation, during computation started the second transaction, which read currentCount as well, and first transaction finish computation and save new currentCount, and then also second transaction finish and rewrite result of first transaction.
Our goal is to execute this operation on foo table only once at the time, and other requests should wait until.
I tried set SERIALIZABLE isolation level like this:
#Transactional({ isolationLevel: IsolationLevel.SERIALIZABLE })
which ensure that only one request is executed at time, but other requests failed with error. Can you please give me some advice how to solve that?
I never used TypeORM and moreover you are hiding the DB engine you are using.
Anyway to achieve this target you need write locks.
The doMagicAndIncreaseCount pseudocode should be something like
BEGIN TRANSACTION
ACQUIRE WRITE LOCK ON id
READ id RECORD
do computation
SAVE RECORD
CLOSE TRANSACTION
Alternatively you have to use some operation which is natively atomic on the DB engine; ex. the INCR operation on Redis.
Edit:
Reading on TypeORM find documentation, I can suggest something like:
this.fooRepository.findOne({
where: { id },
lock: { mode: "pessimistic_write", version: 1 },
})
P.S. Looking at the tags of the question I would guess the used DB engine is PostgreSQL.

How to manage massive calls to Postgresql in Node

I have a question regarding massive calls to PostgreSQL.
This is the scenario:
I have a simple Nodejs app that makes queries to PostgreSQL in a short period of time.
Everything is fine, but sometimes these calls get rejected due to Postgresql maximum pool connections setting, which is equal to 100.
I have in mind to make queue consumption app style, which means adding every query to a queue and then consuming an element every second. By consequence a query to PostgreSQL every second.
But my problem is, Idk where to start. This is the part where I am getting problems with, at some point, I have a lot of calls and I get lots of "ERROR IN QUERY EXECUTION" for the reason explained before.
const pool3 = new Pool(credentialsPostGres);
let res = [];
let sql_call = "select colum1 from table2 where x = y"; //the real query is a bit more complex, but you get the idea.
poll_query.query(sql_call,(err,results) => {
if (err) {
pool3.end();
console.log(err + " ERROR IN QUERY EXECUTION");
} else {
res.push({ data: Object.values(JSON.parse(JSON.stringify(results.rows))) });
pool3.end();
return callback(res,data);
}
})
How I should manage this part into a queue? I am a bit lost.
Help!

Cron job failed without a reason

I am in a situation where I have a CRON task on google app engine (using flex environment) that just dies after some time, but I have no trace WHY (checked the GA Logs, nothing, tried try/catch, and explicitly log it - no error).
I have explicitly verified that if I create a cron task that runs for 8 minutes (but doesn't do much - just sleeps and updates database every second), it will run successfully. This is just to prove that CRON jobs can at least run 8 minutes if not more. & I have set up the Express & NodeJS combo up correctly.
This is all fine, but seems that my other cron job dies in 2-3 minutes, so quite fast. It is hitting some kind of limit, but I have no idea how to control for it, or even what limit it is, so all I can do is speculate.
I will tell more about my CRON task. It is basically rapidly querying MongoDB database where every query is quite fast. I've tried the same code locally, and there are no problems.
My speculation is that I am somehow creating too many MongoDB requests at once, and potentially running out of something?
Here's a pseudocode (just to describe what kind of scale data we're talking about - the numbers and flow are exactly the same):
function q1() {
return await mongoExecute(async (db) => {
const [l1, l2] = await Promise.all([
db.collection('Obj1').count({uid1: c1, u2action: 'L'}),
db.collection('Obj1').count({uid2: c2, u1action: 'L'}),
]);
return l1+l2;
});
}
for(let i = 0; i < 8000; i++) {
const allImportantInformation = Promise.all([
q1(),
q2(),
q3(),
.....
q10()
])
await mongoDb.saveToServer(document);
}
It is getting somewhere around i=1600 before the CRON job just dies without any explanation. The GA Cron Job panel clearly says the JOB has failed.
Here is also my mongoExecute (which is just a separate module that caches the db object, which hopefully is the correct practice in order to ensure that mongodb pooling works correctly.)
import { MongoClient, Db } from 'mongodb';
let db = null;
let promiseInProgress = null;
export async function mongoExecute<T> (executor: (instance: Db) => T): Promise<T | null> {
if (!db) {
if (!promiseInProgress) {
promiseInProgress = new Promise(async (resolve, reject) => {
const tempDb = await MongoClient.connect(process.env.MONGODB_URL);
resolve(tempDb);
});
}
db = await promiseInProgress;
}
try {
const value = await executor(db);
return value;
} catch (error) {
console.log(error);
return null;
}
}
What would be the solution? My idea is to basically ensure less requests are made at once (so all the promises would be sequential, and potentially add sleep between each cycle in the FOR.
I don't understand because it works fine up until some specific point (and quite big point, it's definitely different amount, sometimes it is 800, sometimes 1200, etc).
Is there any "running out of TCP connections" scenario happening? Theoretically we shouldn't run out of anything because we don't have much open at any given point.
It seems to be working if I throw 200ms wait between each cycle & I suspect I can figure out solution, all the items don't have to be updated in the same CRON execution, but it is a bit annoying, and I would like to know what's going on.
Is the garbage collector not catching up fast enough, why exactly is GA silently failing my cron task?
I discovered what the bug is, and fixed it accordingly.
Let me rephrase it; I have no idea what the bug was, and having no errors at any point was discouraging, however I managed to fix (lucky guess) whatever was happening by updating my nodejs mongodb driver to the latest version (from 2.xx -> 3.1.10).
No sleeps needed in my code anymore.

Request rate is large

Im using Azure documentdb and accessing it through my node.js on express server, when I query in loop, low volume of few hundred there is no issue.
But when query in loop slightly large volume, say around thousand plus
I get partial results (inconsistent, every time I run result values are not same. May be because of asynchronous nature of Node.js)
after few results it crashes with this error
body: '{"code":"429","message":"Message: {\"Errors\":[\"Request rate is large\"]}\r\nActivityId: 1fecee65-0bb7-4991-a984-292c0d06693d, Request URI: /apps/cce94097-e5b2-42ab-9232-6abd12f53528/services/70926718-b021-45ee-ba2f-46c4669d952e/partitions/dd46d670-ab6f-4dca-bbbb-937647b03d97/replicas/130845018837894542p"}' }
Meaning DocumentDb fail to handle 1000+ request per second?
All together giving me a bad impression on NoSQL techniques.. is it short coming of DocumentDB?
As Gaurav suggests, you may be able to avoid the problem by bumping up the pricing tier, but even if you go to the highest tier, you should be able to handle 429 errors. When you get a 429 error, the response will include a 'x-ms-retry-after-ms' header. This will contain a number representing the number of milliseconds that you should wait before retrying the request that caused the error.
I wrote logic to handle this in my documentdb-utils node.js package. You can either try to use documentdb-utils or you can duplicate it yourself. Here is a snipit example.
createDocument = function() {
client.createDocument(colLink, document, function(err, response, header) {
if (err != null) {
if (err.code === 429) {
var retryAfterHeader = header['x-ms-retry-after-ms'] || 1;
var retryAfter = Number(retryAfterHeader);
return setTimeout(toRetryIf429, retryAfter);
} else {
throw new Error(JSON.stringify(err));
}
} else {
log('document saved successfully');
}
});
};
Note, in the above example document is within the scope of createDocument. This makes the retry logic a bit simpler, but if you don't like using widely scoped variables, then you can pass document in to createDocument and then pass it into a lambda function in the setTimeout call.

How to populate mongoose with a large data set

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.
My latest attempt looks like this:
var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;
var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');
exports.main = function(callback){
var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
var i = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
if(err){
console.log(err);
} else {
i++;
}
});
if(i === catalogArray.length -1) return callback('database populated');
}
});
};
I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.
But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).
So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?
Rob
i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:
var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
saved++;
if(err){
console.log(err);
} else {
if(saved === catalogArray.length) {
return callback('database populated');
}
}
});
i++;
}
});
UPDATE
If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:
Catalog.remove({}, function(err){
async.forEachLimit(catalogArray, 1, function (catalog, cb) {
new Catalog(JSON.parse(catalog)).save(function (err, doc) {
if (err) {
console.log(err);
}
cb(err);
});
}, function (err) {
callback('database populated');
});
}
Rob,
The short answer:
You created an infinite loop. You're thinking synchronously and with blocking, Javascript functions asynchronously and without blocking. What you are trying to do is like trying to directly turn the feeling of hunger into a sandwich. You can't. The closest thing is you use the feeling of hunger to motivate you to go to the kitchen and make it. Don't try to make Javascript block. It won't work. Now, learn async.forEachLimit. It will work for what you want to do here.
You should probably review asynchronous design patterns and understand what it means on a deeper level. Callbacks are not simply an alternative to return values. They are fundamentally different in how and when they are executed. Here is a good primer: http://cs.brown.edu/courses/csci1680/f12/handouts/async.pdf
The long answer:
There is an underlying problem here, and that is your lack of understanding of what non-blocking IO and asynchronous means. Im not sure if you are breaking into node development, or this is just a one-off project, but if you do plan to continue using node (or any asynchronous language) then it is worth the time to understand the difference between synchronous and asynchronous design patterns, and what motivations there are for them. So, that is why you have a logic error putting the loop invariant increment inside an asynchronous callback which is creating an infinite loop.
In non-computer science, that means that your increment to i will never occur. The reason is because Javascript executes a single block of code to completion before any asynchronous callbacks are called. So in your code, your loop will run over and over, without i ever incrementing. And, in the background, you are storing the same document in mongo over and over. Each iteration of the loop starts sending document with index 0 to mongo, the callback can't fire until your loop ends, and all other code outside the loop runs to completion. So, the callback queues up. But, your loop runs again since i++ is never executed (remember, the callback is queued until your code finishes), inserting record 0 again, queueing another callback to execute AFTER your loop is complete. This goes on and on until your memory is filled with callbacks waiting to inform your infinite loop that document 0 has been inserted millions of times.
In general, there is no way to make Javascript block without doing something really really bad. For example, something paramount to setting your kitchen on fire to fry some eggs for that sandwich I talked about in the "short answer".
My advice is to take advantage of libs like async. https://github.com/caolan/async JohnnyHK mentioned it here, and he was correct for doing so.

Resources