How do I make HTTP requests inside a loop in NodeJS - node.js

I'm writing a command line script in Node (because I know JS and suck at Bash + I need jQuery for navigating through DOM)… right now I'm reading an input file and I iterate over each line.
How do I go about making one HTTP request (GET) per line so that I can load the resulting string with jQuery and extract the information I need from each page?
I've tried using the NPM httpsync package… so I could make one blocking GET call per line of my input file but it doesn't support HTTPS and of course the service I'm hitting only supports HTTPS.
Thanks!

A good way to handle a large number of jobs in a conrolled manner is the async queue.
I also recommend you look at request for making HTTP requests and cheerio for dealing with the HTML you get.
Putting these together, you get something like:
var q = async.queue(function (task, done) {
request(task.url, function(err, res, body) {
if (err) return done(err);
if (res.statusCode != 200) return done(res.statusCode);
var $ = cheerio.load(body);
// ...
done();
});
}, 5);
Then add all your URLs to the queue:
q.push({ url: 'https://www.example.com/some/url' });
// ...

I would most likely use the async library's function eachLimit function. That will allow you to throttle the number of active connections as well as getting a callback for when all the operations are done.
async.eachLimit(urls, function(url, done) {
request(url, function(err, res, body) {
// do something
done();
});
}, 5, function(err) {
// do something
console.log('all done!');
})

I was worried about making a million simultaneous requests without putting in some kind of throttling/limiting the number of concurrent connections, but it seems like Node is throttling me "out of the box" to something around 5-6 concurrent connections.
This is perfect, as it lets me keep my code a lot simpler while also fully leveraging the inherent asynchrony of Node.

Related

Node.js Iterate CSV File for Requests, but Wait for Response Before Continuing the Iterations

I have some code (somewhat simplified for this discussion) that is something like this
var inputFile='inputfile.csv';
var parser = parse({delimiter: ','}, function (err, data) {
async.eachSeries(data, function (line, callback) {
SERVER.Request(line[0], line[1]);
SERVER.on("RequestResponse", function(response) {
console.log(response);
});
callback();
});
});
SERVER.start()
SERVER.on("ready", function() {
fs.createReadStream(inputFile).pipe(parser);
});
and what I am trying to do is run a CSV file through a command line node program that will iterate over each line and then make a request to a server which responds with an event RequestResponse and I then log the response. the RequestResponse takes a second of so, and the way I have the code set up now it just flies through the CSV file and I get an output for each iteration but it is mostly the output I would expect for the first iteration with a little of the output of the second iteration. I need to know how to make iteration wait until there has been a RequestResponse event before continuing on to the next iteration. is this possible?
I have based this code largely in part on
NodeJs reading csv file
but I am a little lost tbh with Node.js and with async.foreach. any help would be greatly appreciated
I suggest that you bite the bullet and take your time learning promises and async/await. Then you can just use a regular for loop and await a web response promise.
Solution is straight forward. You need to call the "callback" after the server return thats it.
async.eachSeries(data, function (line, callback) {
SERVER.Request(line[0], line[1]);
SERVER.on("RequestResponse", function(response) {
console.log(response);
SERVER.removeAllListeners("RequestResponse", callback)
});
})
What is happening is that eachSeries is expecting callback AFTER you are down with the particular call.

Node async vs sync

I am writing a node server that reads/deletes/adds/etc a file from the filesystem. Is there any performance advantage to reading asynchronously? I can't do anything while waiting for the file to be read. Example:
deleteStructure: function(req, res) {
var structure = req.param('structure');
fs.unlink(structure, function(err) {
if (err) return res.serverError(err);
return res.ok();
});
}
I am also making requests to another server using http.get. Is there any performance advantage to fetching asynchronously? I can't do anything while waiting for the file to be fetched. Example:
getStructure: function(req, res) {
var structure = urls[req.param('structure')];
http.get(structure).then(
function (response) {
return res.send(response);
},
function (err) {
res.serverError(err)
}
);
}
If there is no performance advantage to reading files asynchronously, I can just use the synchronous methods. However, I am not aware of synchronous methods for http calls, do any built in methods exist?
FYI I am using Sails.js.
Thanks!
I can't do anything while waiting for the file to be read.
I can't do anything while waiting for the file to be fetched.
Wrong; you can handle an unrelated HTTP request.
Whenever your code is in the middle of a synchronous operation, your server will not respond to other requests at all.
This asynchronous scalability is the biggest attraction for Node.js.

Nodejs + mikeal/Request module, how to close request or increase MaxSockets

I have a Nodejs app that's designed to perform simple end-to-end testing of a large web application. This app uses the mikeal/Request and Cheerio modules to navigate, request, traverse and inspect web pages across the application.
We are refactoring some tests, and are hitting a problem when multiple request functions are called in series. I believe this may be due to the Node.js process hitting the MaxSockets limit, but am not entirely sure.
Some code...
var request = require('request');
var cheerio = require('cheerio);
var async = require('async');
var getPages_FromMenuLinks = function() {
var pageUrl = 'http://www.example.com/app';
async.waterfall([
function topPageRequest(cb1) {
var menuLinks = [];
request(pageUrl, function(err, resp, page) {
var $ = cheerio.load(page);
$('div[class*="sub-menu"]').each(function (i, elem) {
menuLinks.push($(this).find('a').attr('href');
});
cb1(null, menuLinks);
});
}, function subMenuRequests(menuLinks, cb2) {
async.eachSeries(menuLinks, functionv(link, callback) {
request(link, function(err, resp, page) {
var $ = cheerio.load(page);
// do some quick validation testing of elements on the expected page
callback();
});
}, function() { cb2(null) } );
}
], function () { });
};
module.export = getPages_FromMenuLinks;
Now, if I run this Node script, it runs through the first topPageRequest and starts the subMenuRequests, but then freezes after completing the request for the third sub-menu item.
It seems that I might be hitting a Max-Sockets limit, either on Node or my machine (?) -- I'm testing this on standard Windows 8 machine, running Node v0.10.26.
I've tried using request({pool:{maxSockets:25}, url:link}, function(err, resp..., but it does not seem to make any difference.
It also seems there's a way to abort the request object, if I first instantiate it (as found here). But I have no idea how I would "parse" the page, similar to what's happening in the above code. In other words, from the solution found in the link...
var theRequest = request({ ... });
theRequest.pipe(parser);
theRequest.abort();
..., how would I re-write my code to pipe and "parse" the request?
You can make easily thousands requests at the same time (e.g. from a single for loop) and they will be queued and terminate automatically one by one, once a particular request is served.
I think by default there are 5 sockets per domain and this limit in your case should be more than enough.
It is highly probably that your server does not handle your requests properly (e.g. on error they are not terminated and hung up indefinitely).
There are three steps you can make to find out what is going on:
check if you are sending proper request -- as #mattyice observed there are some bugs in your code.
investigate server code and the way your requests are handled there -- for me it seems that the server does not serve/terminate them in first place.
try to use setTimeout when sending the request. 5000ms should be a reasonable amount of time to wait. On timeout the request will be aborted with appropriate error code.
As an advice: I would recommend to use some more suitable, easier in use and more accurate tools to do your testing: e.g. phantomjs.

async parallel request - partial render

What is the proper way to partially render a view following an async parallel request?
Currently I am doing the following
// an example using an object instead of an array
async.parallel({
one: function(callback){
setTimeout(function(){
callback(null, 1);
// can I partially merge the results and render here?
}, 200);
},
two: function(callback){
setTimeout(function(){
callback(null, 2);
// can I partially merge the results and render here?
}, 100);
}
},
function(err, results) {
// results is now equals to: {one: 1, two: 2}
// merge the results and render a view
res.render('mypage.ejs', { title: 'Results'});
});
It is basically working fine, but, if I have a function1, function2, ..., functionN the view will be rendered only when the slowest function will have completed.
I would like to find the proper way to be able to render the view as soon as the first function is returning to minimise the user delay, and add the results of the function as soon as they are available.
what you want is facebook's bigpipe: https://www.facebook.com/note.php?note_id=389414033919. fortunately, this is easy with nodejs because streaming is built in. unfortunately, template systems are bad at this because async templates are a pain in the butt. however, this is much better than doing any additional AJAX requests.
basic idea is you first send a layout:
res.render('layout.ejs', function (err, html) {
if (err) return next(err)
res.setHeader('Content-Type', 'text/html; charset=utf-8')
res.write(html.replace('</body></html>', ''))
// Ends the response.
// `writePartials` should not return anything in the callback!
writePartials(res.end.bind(res, '</body></html>'))
})
you can't send </body></html> because your document isn't finished. then writePartials would be a bunch of async functions (partials or pagelets) executed in parallel.
function writePartials(callback) {
async.parallel([partial1, partial2, partial3], callback)
})
Note: since you've already written a response, there's not much you can do with errors except log them.
What each partial will do is send inline javascript to the client. For example, the layout can have .stream, and the pagelet will replace .stream's innerHTML upon arrival, or when "the callback finishes".
function partialStream(callback) {
res.render('stream.partial.ejs', function (err, html) {
// Don't return the error in the callback
// You may want to display an error message or something instead
if (err) {
console.error(err.stack)
callback()
return
}
res.write('<script>document.querySelector(".stream").innerHTML = ' +
JSON.stringify(html) + ';</script>')
callback()
})
})
Personally, I have .stream.placeholder and replace it with a new .stream element. The reason is I basically do .placeholder, .placeholder ~ * {display: none} so things don't jump around the page. However, this requires a DIY front-end framework since suddenly the JS gets more complciated.
There, your response is now streaming. Only requirement is that the client supports Javascript.
I think you can't do it just on the backend.
To minimise users' delay you need to send the minimal page to the browser and then to request the rest of the information from the browser via AJAX. Another approach to minimising delays is to send all templates to the browser on the first page load, together with the rendered page, and render all the pages in browser based on the data you request from the server. That's the way I do it. The beauty of nodejs is that you can use the same templating engine both in the backend and frontend and also share the modules.
If your page is composed in such a way that the slow information is further in HTML than the fast information, you can write response partially without using res.render (that renders complete page) and use res.write instead. I don't think though that this approach deserves serious attention as you would stuck with it sooner than you notice...

nodejs and expressjs

app.get('/', function(req, res) {
res.set('Content-Type', 'text/html');
res.send('Yo');
setTimeout(function(){
res.send("Yo");
},1000);
});
It looks like "send" ends the request. How can I get this to write Yo on the screen and then 1 second later (sort of like long polling I guess) write the other Yo to get YoYo? Is there some other method other than send?
Use res.write to generate output in pieces and then complete the response with res.end.
I don't think what you are trying to do is possible.
Once you send a response, the client-server connection will be closed.
Look into sockets (particularly socket.io) in order to keep a connection open and send multiple messages on it.
Try with JQuery+JSON. send the response and then update what ever you need with JQuery and JSON.
This is a good tutorial of expressjs, including DB stuff (mongodb).
If you want to send the result as a single block try
app.get('/', function(req, res) {
res.set('Content-Type', 'text/html');
res.write('Yo');
setTimeout(function(){
res.end("Yo");
},1000);
});
or something like
app.get('/', function(req, res) {
res.set('Content-Type', 'text/html');
var str = 'yo';
setTimeout(function(){
res.end(str + 'yo');
},1000);
});
the thing with node.js is that it relies on an asynchronous "style". so if you introduce something like a "wait" function, you'll lose all the benefits of the asynchronous way of execution.
I believe you can achieve something similar to what you want by:
(asynchronous way) including a function that prints the second "Yo" as a callback to
the first function (or)
(classic wait(synchronous) ) introduce a 'big' loop before presenting the second 'Yo'.
for example:
for(i=0; i < 100000000; i++) {
//do something
}

Resources