I'm trying to benchmark a Node.js express app with the following using the request library:
var request = require('request');
var totalRequests = 100000;
for(var i = 0; i < totalRequests; i++) {
(function(i) {
request('http://localhost:3000/', function(error, response, body) {
console.info('Request ' + (i + 1));
});
})(i);
}
When I run it, I don't see the console.info() request callback for requests for over 40 seconds, then they start. Should'nt I see the requests firing right away?
40 seconds may be the amount of time it takes to prepare 100,000 requests. Since you're looping synchronously, your callbacks can't get called until after all of the requests have been initiated.
I suggest a library like async if you intended to make some or all of the requests in series rather than in parallel.
Related
My issues
Launch 1000+ online API that limits the number of API calls to 10 calls/sec.
Wait for all the API calls to give back a result (or retry), it can take 5 sec before the API sends it data
Use the combined data in the rest of my app
What I have tried while looking at a lot of different questions and answers here on the site
Use promise to wait for one API request
const https = require("https");
function myRequest(param) {
const options = {
host: "api.xxx.io",
port: 443,
path: "/custom/path/"+param,
method: "GET"
}
return new Promise(function(resolve, reject) {
https.request(options, function(result) {
let str = "";
result.on('data', function(chunk) {str += chunk;});
result.on('end', function() {resolve(JSON.parse(str));});
result.on('error', function(err) {console.log("Error: ", err);});
}).end();
});
};
Use Promise.all to do all the requests and wait for them to finish
const params = [{item: "param0"}, ... , {item: "param1000+"}]; // imagine 1000+ items
const promises = [];
base.map(function(params){
promises.push(myRequest(params.item));
});
result = Promise.all(promises).then(function(data) {
// doing some funky stuff with dat
});
So far so good, sort of
It works when I limit the number of API requests to a maximum of 10 because then the rate limiter kicks in. When I console.log(promises), it gives back an array of 'request'.
I have tried to add setTimeout in different places, like:
...
base.map(function(params){
promises.push(setTimeout(function() {
myRequest(params.item);
}, 100));
});
...
But that does not seem to work. When I console.log(promises), it gives back an array of 'function'
My questions
Now I am stuck ... any ideas?
How do I build in retries when the API gives an error
Thank you for reading up to hear, you are already a hero in my book!
When you have a complicated control-flow using async/await helps a lot to clarify the logic of the flow.
Let's start with the following simple algorithm to limit everything to 10 requests per second:
make 10 requests
wait 1 second
repeat until no more requests
For this the following simple implementation will work:
async function rateLimitedRequests (params) {
let results = [];
while (params.length > 0) {
let batch = [];
for (i=0; i<10; i++) {
let thisParam = params.pop();
if (thisParam) { // use shift instead
batch.push(myRequest(thisParam.item)); // of pop if you want
} // to process in the
// original order.
}
results = results.concat(await Promise.all(batch));
await delayOneSecond();
}
return results;
}
Now we just need to implement the one second delay. We can simply promisify setTimeout for this:
function delayOneSecond() {
return new Promise(ok => setTimeout(ok, 1000));
}
This will definitely give you a rate limiter of just 10 requests each second. In fact it performs somewhat slower than that because each batch will execute in request time + one second. This is perfectly fine and already meet your original intent but we can improve this to squeeze a few more requests to get as close as possible to exactly 10 requests per second.
We can try the following algorithm:
remember the start time
make 10 requests
compare end time with start time
delay one second minus request time
repeat until no more requests
Again, we can use almost exactly the same logic as the simple code above but just tweak it to do time calculations:
const ONE_SECOND = 1000;
async function rateLimitedRequests (params) {
let results = [];
while (params.length > 0) {
let batch = [];
let startTime = Date.now();
for (i=0; i<10; i++) {
let thisParam = params.pop();
if (thisParam) {
batch.push(myRequest(thisParam.item));
}
}
results = results.concat(await Promise.all(batch));
let endTime = Date.now();
let requestTime = endTime - startTime;
let delayTime = ONE_SECOND - requestTime;
if (delayTime > 0) {
await delay(delayTime);
}
}
return results;
}
Now instead of hardcoding the one second delay function we can write one that accept a delay period:
function delay(milliseconds) {
return new Promise(ok => setTimeout(ok, milliseconds));
}
We have here a simple, easy to understand function that will rate limit as close as possible to 10 requests per second. It is rather bursty in that it makes 10 parallel requests at the beginning of each one second period but it works. We can of course keep implementing more complicated algorithms to smooth out the request pattern etc. but I leave that to your creativity and as homework for the reader.
So I'm working with websockets to process data from website's API. For every new event I also send some http requests back to the website in order to obtain more data. Up untill now everything has worked fine, but now that I started using async requests to speed it up a bit things got a bit different. My code used to process one event and then move on to the next one (these events come in extremely quick - around 10 per second) but now it just seems to ignore the async (non blocking) part and move on to the next event and that way it just skips over half of the code. Note that the code works fine outside the Pusher. I'm using the 'pusher-client' module. My code looks like this:
var Request = require("request");
var requestSync = require('sync-request');
var Pusher = require('pusher-client');
var events_channel = pusher.subscribe('inventory_changes');
events_channel1.bind('listed', function(data)
{
var var2;
//Async request (to speed up the code)
function myFunction(callback){
request("url", function(error, response, body) {
if (!error && response.statusCode == 200)
{
result = JSON.stringify(JSON.parse(body));
return callback(null, result);
}
else
{
return callback(error, null);
}
});
}
myFunction(function(err, data){
if(!err)
{
var2 = data
return(data);
}
else
{
return(err);
}
});
//The part of the code below waits for the callback and the executes some code
var var1 = var2;
check();
function check()
{
if(var2 === var1)
{
setTimeout(check, 10);
return;
}
var1 = var2;
//A CHUNK OF CODE EXECUTES HERE (connected to the data from the callback)
}
});
In conclusion the code works, but not inside the pusher due to the pusher skipping the asynchronous request. How would I make the pusher wait for my async request to finish, before processing the next event (I have no idea)? If you happen to know, please let me know :)
You need to implement a queue to handle events one after another. I'm curious how it worked before, even without Pusher you'd have to implement some queue mechanism for it.
const eventsQueue = []
events_channel1.bind('listed', function(data) {
eventsQueue.push(data)
handleNewEvent()
})
let processingEvent = false
function handleNewEvent() {
if (processingEvent) return // do nothing if already processing an event
processingEvent = true
const eventData = eventsQueue.shift() // pick the first element from array
if (!eventData) return // all events are handled at the moment
... // handle event data here
processingEvent = false
handleNewEvent() // handle next event
}
Also, you should call clearTimeout method to clear your timeout when you don;t need it anymore.
And it's better to use promises or async/await instead of callbacks. Your code will be much easier to read and maintain.
in my NodeJS app i need to send requests every 2-3 seconds to the third-party service. I have database with objects that contains URL to request and when response is coming i link this response with my object.
Now it's like:
// Getting objects from DB and calling ask function
objectsFromDB.find(function(err, data){
if(!err){
for (var i = 0; i < data.length; i++) {
var object = data[i];
// Calling ask function
startAsking(object);
}
}
});
// Start asking objects ...
function startAsking(object){
var intervalId = setInterval(function(){
console.log("Asking " + object.name);
// ...
// Processing and linking response with object
}, config.INTERVAL);
arrayOfIntervals.push(intervalId);
};
So, now i need to stop the job for one of the objects. How can i do this ?
I see that i can save intervalId but what happens if this intervalId is not mutch with the object ?
Also i see many libraries like:
Agenda
nschedule
node-cron
But i think that all of this libraries is oriented to schedule jobs whit large interval, and i don't know how to stop one of the jobs.
I have a Nodejs app that's designed to perform simple end-to-end testing of a large web application. This app uses the mikeal/Request and Cheerio modules to navigate, request, traverse and inspect web pages across the application.
We are refactoring some tests, and are hitting a problem when multiple request functions are called in series. I believe this may be due to the Node.js process hitting the MaxSockets limit, but am not entirely sure.
Some code...
var request = require('request');
var cheerio = require('cheerio);
var async = require('async');
var getPages_FromMenuLinks = function() {
var pageUrl = 'http://www.example.com/app';
async.waterfall([
function topPageRequest(cb1) {
var menuLinks = [];
request(pageUrl, function(err, resp, page) {
var $ = cheerio.load(page);
$('div[class*="sub-menu"]').each(function (i, elem) {
menuLinks.push($(this).find('a').attr('href');
});
cb1(null, menuLinks);
});
}, function subMenuRequests(menuLinks, cb2) {
async.eachSeries(menuLinks, functionv(link, callback) {
request(link, function(err, resp, page) {
var $ = cheerio.load(page);
// do some quick validation testing of elements on the expected page
callback();
});
}, function() { cb2(null) } );
}
], function () { });
};
module.export = getPages_FromMenuLinks;
Now, if I run this Node script, it runs through the first topPageRequest and starts the subMenuRequests, but then freezes after completing the request for the third sub-menu item.
It seems that I might be hitting a Max-Sockets limit, either on Node or my machine (?) -- I'm testing this on standard Windows 8 machine, running Node v0.10.26.
I've tried using request({pool:{maxSockets:25}, url:link}, function(err, resp..., but it does not seem to make any difference.
It also seems there's a way to abort the request object, if I first instantiate it (as found here). But I have no idea how I would "parse" the page, similar to what's happening in the above code. In other words, from the solution found in the link...
var theRequest = request({ ... });
theRequest.pipe(parser);
theRequest.abort();
..., how would I re-write my code to pipe and "parse" the request?
You can make easily thousands requests at the same time (e.g. from a single for loop) and they will be queued and terminate automatically one by one, once a particular request is served.
I think by default there are 5 sockets per domain and this limit in your case should be more than enough.
It is highly probably that your server does not handle your requests properly (e.g. on error they are not terminated and hung up indefinitely).
There are three steps you can make to find out what is going on:
check if you are sending proper request -- as #mattyice observed there are some bugs in your code.
investigate server code and the way your requests are handled there -- for me it seems that the server does not serve/terminate them in first place.
try to use setTimeout when sending the request. 5000ms should be a reasonable amount of time to wait. On timeout the request will be aborted with appropriate error code.
As an advice: I would recommend to use some more suitable, easier in use and more accurate tools to do your testing: e.g. phantomjs.
I have an external API that rate-limits API requests to up to 25 requests per second. I want to insert parts of the results into a MongoDB database.
How can I rate limit the request function so that I don't miss any of API results for all of the array?
MongoClient.connect('mongodb://127.0.0.1:27017/test', function (err, db) {
if (err) {
throw err;
} else {
for (var i = 0; i < arr.length; i++) {
//need to rate limit the following function, without missing any value in the arr array
request( {
method: 'GET',
url: 'https://SOME_API/json?address='+arr[i]
},
function (error, response, body) {
//doing computation, including inserting to mongo
}
)
};
}
});
This could possibly be done using the request-rate-limiter package. So you can add this to your code :
var RateLimiter = require('request-rate-limiter');
const REQS_PER_MIN = 25 * 60; // that's 25 per second
var limiter = new RateLimiter(REQS_PER_MIN);
and since request-rate-limiter is based on request you can just replace request with limiter.request
You can find further information on the package's npm page - https://www.npmjs.com/package/request-rate-limiter
On a personal note - I'd replace all these callbacks with promises
You need to combine 2 things.
A throttling mechanism. I suggest _.throttle from the lodash project. This can do the rate limiting for you.
You also need an async control flow mechanism to make sure the requests run in series (don't start second one until first one is done). For that I suggest async.eachSeries
Both of these changes will be cleaner if you refactor your code to this signature:
function scrape(address, callback) {
//code to fetch a single address, do computation, and save to mongo here
//invoke the callback with (error, result) when done
}