Request rate is large - node.js

Im using Azure documentdb and accessing it through my node.js on express server, when I query in loop, low volume of few hundred there is no issue.
But when query in loop slightly large volume, say around thousand plus
I get partial results (inconsistent, every time I run result values are not same. May be because of asynchronous nature of Node.js)
after few results it crashes with this error
body: '{"code":"429","message":"Message: {\"Errors\":[\"Request rate is large\"]}\r\nActivityId: 1fecee65-0bb7-4991-a984-292c0d06693d, Request URI: /apps/cce94097-e5b2-42ab-9232-6abd12f53528/services/70926718-b021-45ee-ba2f-46c4669d952e/partitions/dd46d670-ab6f-4dca-bbbb-937647b03d97/replicas/130845018837894542p"}' }
Meaning DocumentDb fail to handle 1000+ request per second?
All together giving me a bad impression on NoSQL techniques.. is it short coming of DocumentDB?

As Gaurav suggests, you may be able to avoid the problem by bumping up the pricing tier, but even if you go to the highest tier, you should be able to handle 429 errors. When you get a 429 error, the response will include a 'x-ms-retry-after-ms' header. This will contain a number representing the number of milliseconds that you should wait before retrying the request that caused the error.
I wrote logic to handle this in my documentdb-utils node.js package. You can either try to use documentdb-utils or you can duplicate it yourself. Here is a snipit example.
createDocument = function() {
client.createDocument(colLink, document, function(err, response, header) {
if (err != null) {
if (err.code === 429) {
var retryAfterHeader = header['x-ms-retry-after-ms'] || 1;
var retryAfter = Number(retryAfterHeader);
return setTimeout(toRetryIf429, retryAfter);
} else {
throw new Error(JSON.stringify(err));
}
} else {
log('document saved successfully');
}
});
};
Note, in the above example document is within the scope of createDocument. This makes the retry logic a bit simpler, but if you don't like using widely scoped variables, then you can pass document in to createDocument and then pass it into a lambda function in the setTimeout call.

Related

How to manage massive calls to Postgresql in Node

I have a question regarding massive calls to PostgreSQL.
This is the scenario:
I have a simple Nodejs app that makes queries to PostgreSQL in a short period of time.
Everything is fine, but sometimes these calls get rejected due to Postgresql maximum pool connections setting, which is equal to 100.
I have in mind to make queue consumption app style, which means adding every query to a queue and then consuming an element every second. By consequence a query to PostgreSQL every second.
But my problem is, Idk where to start. This is the part where I am getting problems with, at some point, I have a lot of calls and I get lots of "ERROR IN QUERY EXECUTION" for the reason explained before.
const pool3 = new Pool(credentialsPostGres);
let res = [];
let sql_call = "select colum1 from table2 where x = y"; //the real query is a bit more complex, but you get the idea.
poll_query.query(sql_call,(err,results) => {
if (err) {
pool3.end();
console.log(err + " ERROR IN QUERY EXECUTION");
} else {
res.push({ data: Object.values(JSON.parse(JSON.stringify(results.rows))) });
pool3.end();
return callback(res,data);
}
})
How I should manage this part into a queue? I am a bit lost.
Help!

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.
I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.
yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**
First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.
This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

Avoiding race conditions and starvation with MongoDB + NodeJS

I have a proxy management pool that is responsible for storing, checking, and retrieving proxies so that they can be used with web requests.
async getNextAvailableProxy() {
while(true) {
var sleepTime = global.Settings.ProxyPool.ProxySleepTimeMS;
var availableProxies = await this.data.find(PROXY_COLLECTION, {
$query: {
Enabled: true,
InUse: false,
LastUsed: { $lte: new Date(new Date() - sleepTime) }
},
$orderby: { ResponseTime: 1 }
});
if (availableProxies.length <= 0) {
var nextAvailable = await this.data.findOne(PROXY_COLLECTION, {
$query: { Enabled: true, InUse: false },
$orderby: { LastUsed: -1 }
});
if (!nextAvailable) {
await Utils.sleep(100);
console.log('No proxies available, sleeping');
continue;
}
sleepTime = sleepTime - (new Date() - nextAvailable.LastUsed)
if (sleepTime > 0)
await Utils.sleep(sleepTime);
continue;
}
var selectedProxy = availableProxies[0];
selectedProxy.InUse = true;
await this.data.save(PROXY_COLLECTION, selectedProxy);
return selectedProxy;
}
}
It is worth noting that my versions of find and save are wrappers around the MongoDB driver for NodeJS.
It is also worth noting that Utils.sleep() is a promise that uses a setTimeout to perform an async sleep.
Now, I understand that since NodeJS is single-threaded, race conditions cannot occur. However, when using multiple isolated objects querying the database rapidly, this is simply not the case.
If I have, say, five instances of object ProxyPool and they all call getNextAvailableProxy() within a short time of each other, they will all fetch the same proxy from the database, because one instance has already started the query before another instance has saved the InUse flag, leaving me with n-instances of ProxyPool all retrieving the save exact proxy.
How can I get around this in an asynchronous manner?
Honestly, it's hard to tell why it's a problem based on your post. While collisions can happen, it should be rare enough not to matter in my opinion, unless the use of the proxy is a really long running operation (and so a given proxy is tied up a lot).
That said, I also would not lookup a proxy on every request. Instead, I'd probably have each worker fetch a pool of proxies either on startup or at intervals (maybe once an hour or something), and then internally manage (in-memory) the proxies it has available.
Your algorithm for figuring out what proxies to give a given worker can then be pretty flexible, and a lot less likely to have collisions, since each node instance is single threaded it won't allocate the same proxy twice.
The risk is that you may hit a place where a given worker has run out of proxies. That's something you'll need to handle as well, but since you will (in theory) have your workers load balanced in some fashion, if you hit that spot you're probably running out of proxies anyway and will have to issue a Too Busy response soon.
Finally, when you do hit the DB for a list of available proxies, you should be using findAndModify() or similar to fetch and update the documents in one shot, so that as you pull one out of the DB you tell the DB it's not available, rather than waiting on processing on your web server first.

Express Node Request For Loop Issue [duplicate]

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

nodejs http response.write: is it possible out-of-memory?

If i have following code to send data repeatedly to client every 10ms:
setInterval(function() {
res.write(somedata);
}, 10ms);
What would happen if the client is very slow to receive the data?
Will server get out-of-memory error?
Edit:
actually the connection is kept alive, sever send jpeg data endlessly (HTTP multipart/x-mixed-replace header + body + header + body.....)
Because node.js response.write is asynchronous,
so some users guess it may store data in internal buffer and wait until low layer tells it can send,
so the internal buffer will grow, am i right?
If i am right, then how to resolve this?
the problem is node.js does not notify me when data is send for a single write call.
In other word, i can not tell user this way is theoretically no risk of "out of memory" and how to fix it.
Update:
By the keyword "drain" event given by user568109, i studied the source of node.js, and got conclusion:
it really will cause "out-of-memory" error. I should check return value of response.write(...)===false and then handle "drain" event of the response.
http.js:
OutgoingMessage.prototype._buffer = function(data, encoding) {
this.output.push(data); //-------------No check here, will cause "out-of-memory"
this.outputEncodings.push(encoding);
return false;
};
OutgoingMessage.prototype._writeRaw = function(data, encoding) { //this will be called by resonse.write
if (data.length === 0) {
return true;
}
if (this.connection &&
this.connection._httpMessage === this &&
this.connection.writable &&
!this.connection.destroyed) {
// There might be pending data in the this.output buffer.
while (this.output.length) {
if (!this.connection.writable) { //when not ready to send
this._buffer(data, encoding); //----------> save data into internal buffer
return false;
}
var c = this.output.shift();
var e = this.outputEncodings.shift();
this.connection.write(c, e);
}
// Directly write to socket.
return this.connection.write(data, encoding);
} else if (this.connection && this.connection.destroyed) {
// The socket was destroyed. If we're still trying to write to it,
// then we haven't gotten the 'close' event yet.
return false;
} else {
// buffer, as long as we're not destroyed.
this._buffer(data, encoding);
return false;
}
};
Some gotchas:
If sending over http it is not be a good idea. The browser may consider the request as timeout if it is not finished within specified amount of time. Server too will close connection which is idle for too long. If client cannot keep up, the timeout is almost certain.
setInterval for 10ms is also subject to some restrictions. It doesn't mean it will repeat after every 10ms, 10ms is the minimum it will wait before repeating. It will be slower than what you set the interval.
Let's say you chance to overload the response with data, then at some point the server will end connection and respond by 413 Request Entity Too Large depending on what the limit is set.
Node.js has single threaded architecture with a max memory limitation of around 1.7 GB. If you set your above server limits to too high and have many incoming connections you will get process out of memory error.
So with appropriate limits it will either give timeout or be request too large. (And there are no other errors in your program.)
Update
You need to use drain event. The http response is a writable stream. It has its own internal buffer. When the buffer is emptied the drain event is triggered. You should learn more about streams as you would go in deeper. This will help you not just in http. You can find several resources about streams on web.

Resources