nodejs + postgresql way too slow

nodejs + postgresql way too slow - node.js

I have this piece of code:
var pg = require('pg');
var QueryStream = require('pg-query-stream');
var constr = 'postgres://devel:1234#127.0.0.1/tcc';
var JSONStream = require('JSONStream');
var http = require('http');
pg.connect(constr, function(err, client, done) {
if (err) {
console.log('Erro ao conectar cliente.', err);
process.exit(1);
}
sql = 'SELECT \
pessoa.cod, \
pessoa.nome, \
pessoa.nasc, \
cidade.nome AS cidade \
FROM pessoa, cidade \
WHERE cidade.cod IN (1, 2, 3);';
http.createServer(function (req, resp) {
resp.writeHead(200, { 'Content-Type': 'text/html; Charset=UTF-8' });
var query = new QueryStream(sql);
var stream = client.query(query);
//stream.on('data', console.log);
stream.on('end', function() {
//done();
resp.end()
});
stream.pipe(JSONStream.stringify()).pipe(resp);
}).listen(8080, 'localhost');
});
When I run apache bench on it, it get only about four requests per second.
If I run the same query in php/apache or java/tomcat I get ten times faster
results. The database has 1000 rows. If I limit the query to about ten rows,
then node is double faster than php/java.
What am I doing wrong?
EDIT: Some time ago I opened an issue here: https://github.com/brianc/node-postgres/issues/653
I'm providing this link because I posted there some other variations on the code I have tried.
Even with comments and hints so far, I have not been able to get a descent speed.

pg-query-stream uses cursors.
it uses cursors (bold for emphasis).
you can read the code and change batchSize to better fit your needs.
For those who don't know what cursors are, in short they are a trade-off for keeping memory footprint small and not reading a whole table in memory. But if you get 100 rows at a time when you have 1000 results, that's 1000 / 100 round-trips; so probably 10x slower than a solution not using cursors.
If you know how many rows you need, add a limit to your query, and change the number of rows returned each time to minimize number of roundtrips.

As far as I can tell from this code, you create a single one connection to the PostgreSQL and everything gets queued through it.
The pg module allows for this, it's described here:
https://github.com/brianc/node-postgres/wiki/Queryqueue
If you want a real performance, the for each HTTP request you should fetch the connection from the pool, use it, release it and make 101% sure you always release (e.g. proper exception handling) or your server will die once the pool gets completely exhausted.
Once you are there you can tweak the connection pool parameters and measure performance.

Looks like you're waiting for the server to be created before the request gets relayed. Try moving http.createServer outside of the call. If you only want to use the http server in the request, you should try making the calls async.

Maybe you should set http.agent.maxSockets value, try this:
var http = require('http');
http.agent.maxSockets = {{number}};
default maxSockets is 5

Related

How to manage massive calls to Postgresql in Node

I have a question regarding massive calls to PostgreSQL.
This is the scenario:
I have a simple Nodejs app that makes queries to PostgreSQL in a short period of time.
Everything is fine, but sometimes these calls get rejected due to Postgresql maximum pool connections setting, which is equal to 100.
I have in mind to make queue consumption app style, which means adding every query to a queue and then consuming an element every second. By consequence a query to PostgreSQL every second.
But my problem is, Idk where to start. This is the part where I am getting problems with, at some point, I have a lot of calls and I get lots of "ERROR IN QUERY EXECUTION" for the reason explained before.
const pool3 = new Pool(credentialsPostGres);
let res = [];
let sql_call = "select colum1 from table2 where x = y"; //the real query is a bit more complex, but you get the idea.
poll_query.query(sql_call,(err,results) => {
if (err) {
pool3.end();
console.log(err + " ERROR IN QUERY EXECUTION");
} else {
res.push({ data: Object.values(JSON.parse(JSON.stringify(results.rows))) });
pool3.end();
return callback(res,data);
}
})
How I should manage this part into a queue? I am a bit lost.
Help!

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.

I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.

yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**

First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.

This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

node.js loop crashed immediately when insert bulk data to cassandra

I am trying to insert 1000000 data to cassandra with nodeJS. But the loop is crashed a little time later. Every time I cannot insert over 10000 record. Why the loop is crashed anybody help me.
Thanks.
My code looks like:
var helenus = require('helenus'),
pool = new helenus.ConnectionPool({
hosts : ['localhost:9160'],
keyspace : 'twissandra',
user : '',
password : '',
timeout : 3000
});
pool.on('error', function(err){
console.error(err.name, err.message);
});
var i=0;
pool.connect(function(err, keyspace){
if(err){ throw(err);
} else {
while (i<1000000){
i++;
var str="tkg" + i;
var pass="ktr" + i;
pool.cql("insert into users (username,password) VALUES (?,?)",[str, pass],function(err, results){
});
}
}
});
console.log("end");

You're probably overloading the Cassandra queue by attempting to make a million requests all at once! Keep in mind the request is asynchronous, so it is made even if the previous one has not completed.
Try using async.eachLimit to limit it to 50-100 requests at a time. The actual maximum concurrent capacity changes based on the backend process.

Actually there was no problem. I checked the number of records twice at different times and i saw that the write operation continued until timeout value. The timeout value is given inside the code. As a summary in the code there is no crash, thank you Julian H. Lam for reply.
But another question is that how to increase write performance of cassandra? What should i change in cassandra.yaml file or any?
Thank you.

Handling multiple parallel HTTP requests in Node.js

I know that Node is non-blocking, but I just realized that the default behaviour of http.listen(8000) means that all HTTP requests are handled one-at-a-time. I know I shouldn't have been surprised at this (it's how ports work), but it does make me seriously wonder how to write my code so that I can handle multiple, parallel HTTP requests.
So what's the best way to write a server so that it doesn't hog port 80 and long-running responses don't result in long request queues?
To illustrate the problem, try running the code below and loading it up in two browser tabs at the same time.
var http = require('http');
http.createServer(function (req, res) {
res.setHeader('Content-Type', 'text/html; charset=utf-8');
res.write("<p>" + new Date().toString() + ": starting response");
setTimeout(function () {
res.write("<p>" + new Date().toString() + ": completing response and closing connection</p>");
res.end();
}, 4000);
}).listen(8080);

You are misunderstanding how node works. The above code can accept TCP connections from hundreds or thousands of clients, read the HTTP requests, and then wait the 4000 ms timeout you have baked in there, and then send the responses. Each client will get a response in about 4000 + a small number of milliseconds. During that setTimeout (and during any I/O operation) node can continue processing. This includes accepting additional TCP connections. I tested your code and the browsers each get a response in 4s. The second one does NOT take 8s, if that is how you think it works.
I ran curl -s localhost:8080 in 4 tabs as quickly as I can via the keyboard and the seconds in the timestamps are:
54 to 58
54 to 58
55 to 59
56 to 00
There's no issue here, although I can understand how you might think there is one. Node would be totally broken if it worked as your post suggested.
Here's another way to verify:
for i in 1 2 3 4 5 6 7 8 9 10; do curl -s localhost:8080 &;done

Your code can accept multiple connections because the job is done in callback function of the setTimeout call.
But if you instead of setTimeout do a heavy job... then it is true that node.js will not accept other multiple connections! SetTimeout accidentally frees the process so the node.js can accept other jobs and you code is executed in other "thread".
I don't know which is the correct way to implement this. But this is how it seems to work.

Browser blocks the other same requests. If you call it from different browsers then this will work parallelly.

I used following code to test request handling
app.get('/', function(req, res) {
console.log('time', MOMENT());
setTimeout( function() {
console.log(data, ' ', MOMENT());
res.send(data);
data = 'changing';
}, 50000);
var data = 'change first';
console.log(data);
});
Since this request doesn't take that much processing time, except for 50 sec of setTimeout and all the time-out were processed together like usually do.
Response 3 request together-
time moment("2017-05-22T16:47:28.893")
change first
time moment("2017-05-22T16:47:30.981")
change first
time moment("2017-05-22T16:47:33.463")
change first
change first moment("2017-05-22T16:48:18.923")
change first moment("2017-05-22T16:48:20.988")
change first moment("2017-05-22T16:48:23.466")
After this i moved to second phase... i.e., what if my request takes so much time to process a sync file or some thing else that take time.
app.get('/second', function(req, res) {
console.log(data);
if(req.headers.data === '9') {
res.status(200);
res.send('response from api');
} else {
console.log(MOMENT());
for(i = 0; i<9999999999; i++){}
console.log('Second MOMENT', MOMENT());
res.status(400);
res.send('wrong data');
}
var data = 'second test';
});
As my first request was still in process so my second didn't get accepted by Node. Thus i got following response of 2 request-
undefined
moment("2017-05-22T17:43:59.159")
Second MOMENT moment("2017-05-22T17:44:40.609")
undefined
moment("2017-05-22T17:44:40.614")
Second MOMENT moment("2017-05-22T17:45:24.643")
Thus For all Async functions theres a virtual thread in Node and Node does accept other request before completing previous requests async work like(fs, mysql,or calling API), however it keeps it self as single thread and does not process other request until all previous ones are completed.

Socket.IO server throttling a fast client

I have a server that uses socket.io and I need a way of throttling a client that is sending the server data too quickly. The server exposes both a TCP interface and a socket.io interface - with the TCP server (from the net module) I can use socket.pause() and socket.resume(), and this effectively throttles the client. But with socket.io's socket class there are no pause() and resume() methods.
What would be the easiest way of getting feedback to a client that it is overwhelming the server and needs to slow down? I liked socket.pause() and socket.resume() because it didn't require any additional code on the client-side - backup the TCP socket and things naturally slow down. Any equivalent for socket.io?
Update: I provide an API to interact with the server (there is currently a python version which runs over TCP and a JavaScript version which uses socket.io). So I don't have any real control over what the client does. Which is why using socket.pause() and socket.resume() is so great - backing up the TCP stream slows the python client down no matter what it tries to do. I'm looking for an equivalent for a JavaScript client.

With enough digging I found this:
this.manager.transports[this.id].socket.pause();
and
this.manager.transports[this.id].socket.resume();
Granted this probably won't work if the socket.io connection isn't a web sockets connection, and may break in a future update, but for now I'm going to go with it. When I get some time in the future I'll probably change it to the QUOTA_EXCEEDED solution that Pascal proposed.

Here is a dirty way to achieve throttling. Although this is a old post; some people may benefit from it:
First register a middleware:
io.on("connection", function (socket) {
socket.use(function (packet, next) {
if (throttler.canBeServed(socket, packet)) {
next();
}
});
//You other code ..
});
canBeServed is a simple throttler as seen below:
function canBeServed(socket, packet) {
if (socket.markedForDisconnect) {
return false;
}
var previous = socket.lastAccess;
var now = Date.now();
if (previous) {
var diff = now - previous;
//Check diff and disconnect if needed.
if (diff < 50) {
socket.markedForDisconnect = true;
setTimeout(function () {
socket.disconnect(true);
}, 1000);
return false;
}
}
socket.lastAccess = now;
return true;
}
You can use process.hrtime() instead of Date.time().

If you have a callback on your server somewhere which normally sends back the response to your client, you could try and change it like this:
before:
var respond = function (res, callback) {
res.send(data);
};
after
var respond = function (res, callback) {
setTimeout(function(){
res.send(data);
}, 500); // or whatever delay you want.
};

Looks like you should slow down your clients. If one client can send too fast for your server to keep up, this is not going to go very well with 100s of clients.
One way to do this would be have the client wait for the reply for each emit before emitting anything else. This way the server can control how fast the client can send by only answering when ready for example, or only answer after a set time.
If this is not enough, when a client exceeded x requests per second, start replying with something like QUOTA_EXCEEDED error, and ignore the data they send in. This will force external developers to make their app behave as you want them to do.

As another suggestion, I would propose a solution like this:
It is common for MySQL to get a large amount of requests which would take longer time to apply than the rate the requests coming in.
The server can record the requests in a table in db assuming this action is fast enough for the rate the requests are coming in and then process the queue at a normal rate for the server to sustain. This buffer system will allow the server to run slow but still process all the requests.
But if you want something sequential, then the request callback should be verified before the client can send another request. In this case, there should be a server ready flag. If the client is sending request while the flag is still red, then there can be a message telling the client to slow down.

simply wrap your client emitter into a function like below
let emit_live_users = throttle(function () {
socket.emit("event", "some_data");
}, 2000);
using use a throttle function like below
function throttle(fn, threshold) {
threshold = threshold || 250;
var last, deferTimer;
return function() {
var now = +new Date, args = arguments;
if(last && now < last + threshold) {
clearTimeout(deferTimer);
deferTimer = setTimeout(function() {
last = now;
fn.apply(this, args);
}, threshold);
} else {
last = now;
fn.apply(this, args);
}
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

nodejs + postgresql way too slow - node.js

Looks like you're waiting for the server to be created before the request gets relayed. Try moving http.createServer outside of the call. If you only want to use the http server in the request, you should try making the calls async.

Maybe you should set http.agent.maxSockets value, try this: var http = require('http'); http.agent.maxSockets = {{number}}; default maxSockets is 5

Related

How to manage massive calls to Postgresql in Node

Querying DB2 every 15 seconds causing memory leak in NodeJS

node.js loop crashed immediately when insert bulk data to cassandra

Handling multiple parallel HTTP requests in Node.js

Socket.IO server throttling a fast client

Categories

Resources