Node.js outbound http request concurrency

Node.js outbound http request concurrency - node.js

I've got a node.js script that pulls data from an external web API for local storage. The first request is a query that returns a list of IDs that I need to get further information on. For each ID returned, I spawn a new http request from node.js and reach out to the server for the data (POST request). Once the job is complete, I sleep for 3 minutes, and repeat. Sometimes the number of IDs is in the hundreds. Each individual http request for those returns maybe 1kb of data, usually less, so the round trip is very short.
I got an email this morning from the API provider begging me to shut off my process because I'm "occupying all of the API servers with hundreds of connections" (which I am actually pretty proud of, but that is not the point). To be nice, I increased the sleep from 3 minutes to 30 minutes, and that has so far helped them.
On to the question... now I've not set maxSockets or anything, so I believe the default is 5. Shouldn't that mean I can only create 5 live http request connections at a time? How does the admin have hundreds? Is their server not hanging up the connection once the data is delivered? Am I not doing so? I don't have an explicit disconnect at the end of my http request, so perhaps I am at fault here. So what does maxSockets actually set?

Sorry for some reason I didn't read your question correctly
maxSockets is the max number of connections the http module will make for that current process. You can check to see what yours is currently set at by accessing it from http.globalAgent.maxSockets.
You can see some information on the current number of connections you have to a given host with the following:
console.log("Active socket connections: %d", http.globalAgent.sockets['localhost:8080'].length )
console.log("Total queued requests: %d", http.globalAgent.requests['localhost:8080'].length)
Substituting the localhost:8080 for what ever host and port you are making the request too.
You can see how node handles these connections at the following two points:
Adding a new connection and storing to the request queue
https://github.com/joyent/node/blob/master/lib/_http_agent.js#L83
Creating connections from queued requests
https://github.com/joyent/node/blob/master/lib/_http_agent.js#L148
I wrote this up really quick to give you an idea how you could stagger those requests out a bit. This particular code doesn't check to see how many requests are "pending" you could easily modify it to allow you to only have a set number of requests going out at any given time (which honestly would be the better way to do it).
var Stagger = function (data, stagger, fn, cb) {
var self = this;
this.timerID = 0;
this.data = [].concat(data);
this.fn = fn;
this.cb = cb;
this.stagger = stagger;
this.iteration = 0;
this.store = {};
this.start = function () {
(function __stagger() {
self.fn(self.iteration, self.data[self.iteration], self.store);
self.iteration++;
if (self.iteration != self.data.length)
self.timerID = setTimeout(__stagger, self.stagger);
else
cb(self.store);
})();
};
this.stop = function () {
clearTimeout(self.timerID);
};
};
var t = new Stagger([1,2,3,4,5,6], 1000, function (i, item, store) {
console.log(i, item);
if (!store.out) store.out = [];
store.out[i] = Math.pow(2,i);
},
function (store) {
console.log('Done!', store);
});
t.start();
This code can definitely could be improved but it should give you an idea of maybe where to start.
Live Demo: http://jsbin.com/ewoyik/1/edit (note: requires console)

Related

Progress bar for express / react communicating with backend

I want to make a progress bar kind of telling where the user where in process of fetching the API my backend is. But it seems like every time I send a response it stops the request, how can I avoid this and what should I google to learn more since I didn't find anything online.
React:
const {data, error, isError, isLoading } = useQuery('posts', fetchPosts)
if(isLoading){<p>Loadinng..</p>}
return({data&&<p>{data}</p>})
Express:
app.get("api/v1/testData", async (req, res) => {
try {
const info = req.query.info
const sortByThis = req.query.sortBy;
if (info) {
let yourMessage = "Getting Data";
res.status(200).send(yourMessage);
const valueArray = await fetchData(info);
yourMessage = "Data retrived, now sorting";
res.status(200).send(yourMessage);
const sortedArray = valueArray.filter((item) => item.value === sortByThis);
yourMessage = "Sorting Done now creating geojson";
res.status(200).send(yourMessage);
createGeoJson(sortedArray)
res.status(200).send(geojson);
}
else { res.status(400) }
} catch (err) { console.log(err) res.status(500).send }
}

You can only send one response to a request in HTTP.
In case you want to have status updates using HTTP, the client needs to poll the server i.e. request status updates from the server. Keep in mind though that every request needs to be processed on the server side and will take resources away which are then not available for other (more important) requests from other clients. So don't poll too frequently.
If you want to support long running operations using HTTP have a look at the following API design pattern.
Alternatively you could also use a WebSockets connection to push updates from the server to the client. I assume your computation on the backend will not be minutes long and you want to update the client in real-time, so probably WebSockets will be the best option for you. A WebSocket connection has, once established, considerably less overhead than sending huge HTTP requests/ responses between client and server.
Have a look at this thread which dicusses abovementioned and other possibilites.

Chain of endpoints in Node and Express: how to prevent that some of them stops all the series?

In some page I have to get information from 8 different endpoints. 2 of them are outside of my application and sometimes they cause an delay at displaying data. The web browser waits until the data is processed. Once they're outside of my app I can't refactor them in order to make them fast, but I need to show the information that they provide. In addition, sometimes one of them returns nothing. If so, I use default data to show to the user. The waiting time takes time for the user experience perspective.
I'm using promises to call these endpoints. Below is part of the code snippet that I am using.
The code is working fine. The issue is the delay.
First. Here is the array that contains all the service that I need to process:
var requests = [{
// 0
url: urlLocalApi + '/endpointURL_1/',
headers: {
'headers': 'apitoken'
},
}, {
// 1
url: urlLocalApi + '/endpointURL_2/',
headers: {
'headers': 'apitoken'
},
];
The code of array is encapsulated in this method:
const requests = homePageFunctions.createRequest();
Now, it is how the data is processed. I am using both 'request-promise' and 'bluebird', and a personal logger to check it out if everything goes fine.
const Promise = require("bluebird");
const request = require('request-promise');
var viewsHelper = {
getPageData: function (requests) {
return Promise.map(requests, function (obj) {
return request(obj).then(function (body) {
AppLogger.log(`Endpoint parsed`, statusLogger.infodate);
return JSON.parse(body);
});
});
}
}
module.exports = viewsHelper;
How do I call this?
viewsHelper.getPageData(requests)
.then(results => {
var output = [];
for (var i = 0; i < results.length; i++) {
output.push(results[i]);
}
// render data
res.render('homepage/index', output);
AppLogger.log(`PageData is rendered`, statusLogger.infodate);
})
.catch(err => {
console.log(err);
});
};
Take a look that inside of each index item of "output" array, there is the output of each data of each endpoint.
The problem here is:
If any of the endpoint takes long, the entire chain slows even though
if they are already processed. The web page waits in a blank mode.
How to prevent this behavior?

That is an interesting question but I have questions in order to answer it effectively.
You have Node server and client (HTML/JS)
You have 8 end points 2 are slow because you don’t have control over them.
Does the client (page) aware of the 8 end points? I .e you make 8 calls everytime you reload the page?
OR
Does the page makes one request to your node JS and your nodeJS synchronously calls the 8 end points
If it is 1 then lazy loading will work easily for you since the page is making the requests.
If it is 2 lazy loading will work only at the server side however the client will be blocked because it doesn’t know (or care how you load your data. The page made one request and it is blocked waiting for that request..
Obviously each method has pros and cons ..
One way you can solve this is to asynchronously call those end points on node and cache them and when the page makes the 1 request you have the data ready ..
Again we know very little about the situation there are many ways to solve this
Hope this helps

How to get realtime stock market quotes through an http request without flooding/hitting request limit (Algotrading)

I made a simple program that uses the Google Finance API to grab stock data through HTTP requests and does some calculations on them.
The google-api looks like this(adds a new block of data every minute during trading hours):
https://www.google.com/finance/getprices?i=60&p=0d&f=d,o,h,l,c,v&df=cpct&q=AAPL
This works fine, however I have a huge list of stock-tickers I need to get data from. In order to loop through them without hitting a request limit I set a time interval of 2 seconds between the requests. There's over 5000 stocks, so this takes forever and I need it to get done in < 5 minutes in order for the algorithm to be useful.
I was wondering if there is a way to achieve this with HTTP requests? Or if I'm tackling this the wrong way. I can't download the data beforehand to do it on the client-side as I need to get the data as soon as the first quotes come out in the morning.
Programmed in JavaScript (nodejs), but answers in any language is fine. Here's the function that I call with 2 second intervals:
var getStockData = function(ticker, day, cb){
var http = require('http');
var options = {
host: "www.google.com",
path: "/"
};
ticker = ticker.replace(/\s+/g, '');
var data = '';
options.path = "/finance/getprices?i=60&p=" +day+"d&f=d,o,h,l,c,v&df=cpct&q=" + ticker;
var callback = function(response){
response.on('data', function(chunk){
data +=chunk;
});
response.on('end', function(){
var data_clean = cleanUp(data);
if(data_clean === -1) console.log('we couldnt find anything for this ticker');
cb(data_clean);
})
};
http.request(options, callback).end();
};
Any help would be greatly appreciated.

If designing against a certain APIwith policy threshold ( refresh-rate ceiling, bandwidth limit, etc. )
avoid refetching data the node has already received
using the as-is URL above, a huge block of data is being (re)-fetched, most rows of which, if not all, were already known from an "identical URL" call just 2 seconds before:
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=60
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1482330600,116.84,116.84,116.8,116.8,225329
1,116.99,117,116.8,116.84,81304
2,117.26,117.28,116.99,117,225262
3,117.32,117.35,117.205,117.28,153225
4,117.28,117.33,117.22,117.32,104072
.
..
...
..
.
149,116.98,117,116.98,116.98,8175
150,116.994,117,116.98,116.99,2751
151,117,117.005,116.9901,116.9937,7774
152,117.01,117.02,116.99,116.995,13011
153,117.0199,117.02,117.005,117.02,9313
review carefully API-specifications to send smarter requests, yielding minimum-footprint data
watch API End-Of-Life signals, to find another source before API stops provisioning data
(cit.:) The Google Finance APIs are no longer available. Thank you for your interest.
As noted below, in the second comment, the inherent inefficiency of re-fetching repetitively the growing block of already known data is to be avoided.
A professional DataPump design ought use API details for doing this:
adding ts=1482330600 aTimeSTAMP ( unix-format [s] ) to define a start of "new" data to be retrieved, leaving those already seen before the time-stamp, out of the transmitted block.

What is a simple way of counting requests that the sever is serving?

In a simple nodejs server I want to track how many requests per second it is serving across all clients.
Should I be using a database? Or will that cause locking of some sort?

If you just want to know instantaneously, how many requests have been served in a last time period, you can just create your own in-memory data structure that keeps track of the data necessary to calculate that. I see no reason to use a database for this:
If you are using Express, you can just make the first middleware that you register be one that collects the request data:
// collect request info
var requests = [];
var requestTrimThreshold = 5000;
var requestTrimSize = 4000;
app.use(function(req, res, next) {
requests.push(Date.now());
// now keep requests array from growing forever
if (requests.length > requestTrimThreshold) {
requests = requests.slice(0, requests.length - requestTrimSize);
}
next();
});
Now, you have an array of data that logs the time of your last N requests and you can calculate recent requests/second over any recent time period.
Just make sure this middleware is installed first so it is processed before any other middleware that might actually end the request (so it is always called).
Of course, if you want to collect more info than just the time of the request, instead of just pushing a time stamp into the array, you could create an object and push the object into the array. The object could have a timestamp property and any other properties you wish (URL of request, type of request GET, POST, etc..., IP of request, etc...).
Then, if you want to know how many requests in the last minute, you could calculate that like this:
app.get("/requests/minute", function(req, res) {
var now = Date.now();
var aMinuteAgo = now - (1000 * 60);
var cnt = 0;
// since recent requests are at the end of the array, search the array
// from back to front
for (var i = requests.length - 1; i >= 0; i--) {
if (requests[i] >= aMinuteAgo) {
++cnt;
} else {
break;
}
}
res.json({requestsLastMinute: cnt});
});

if you run you server with PM2 by typing
pm2 monit
look to left corner you will see custom metric,from there u can watch req per minutes on server
custom metric

Node.js listen to session variable change and trigger server-sent event

I am writing a webapp, using express.js.
My webapp achieves the following
User posts 100 json objects
Each json object is processed via a service call
Once the service call is completed, a session variable is incremented
On incrementation of the session variable, a server side event must be sent to the client to update the progress bar
How do i achieve listening on a session variable change to trigger a server-sent event?
Listening to a variable change is not the only solution I seek?
I need to achieve sending a server-sent event once a JSON object is processed.
Any appropriate suggestion is welcome
Edit (based on Alberto Zaccagni's comment)
My code looks like this:
function processRecords(cmRecords,requestObject,responseObject)
{
for (var index = 0; index < cmRecords.length; index++)
{
post_options.body = cmRecords[index];
request.post(post_options,function(err,res,body)
{
if(requestObject.session.processedcount)
requestObject.session.processedcount = requestObject.session.processedcount + 1;
else
requestObject.session.processedcount = 1;
if(err)
{
appLog.error('Error Occured %j',err);
}
else
{
appLog.debug('CMResponse: %j',body);
}
var percentage = (requestObject.session.processedcount / requestObject.session.totalCount) * 100;
responseObject.set('Content-Type','text/event-stream');
responseObject.json({'event':'progress','data':percentage});
});
};
}
When the first record is updated and a server side event is triggered using the responseObject (express response object)
When the second record is updated and I try triggering a server side event using the same responseObject. I get an error saying cannot set header to a response that has already been sent

It's hard to know exactly what the situation is without seeing the routes/actions you have in your main application...
However, I believe the issue you are running into is that you are trying to send two sets of headers to the client (browser), which is not allowed. The reason this is not allowed is because the browser does not allow you to change the content type of a response after you have sent the initial response...as it uses that as an indicator of how to process the response you are sending it. You can't change either of these (or any other headers) after you have sent them to a client once (one request -> one response -> one set of headers back to the client). This prevents your server from appearing schizophrenic (by switching from a "200 Ok" response to a "400 Bad Request," for example).
In this case, on the initial request, you are telling the client "Hey, this was a valid request and here is my response (via the status of 200 which is either set elsewhere or being assumed by ExpressJS), and please keep the communication channel open so I can send you updates (by setting your content type to text/event-stream)".
As far as how to "fix" this, there are many options. When I've done this, I've used the pub/sub feature of redis to act as the "pipe" that connects everything up. So, the flow has been like this:
Some client sends a request to /your-event-stream-url
In this request, you set up your Redis subscriber. Anything that comes in on this subscription can be handled however you want. In your case, you want to "send some data down the pipe to the client in a JSON object with at least a data attribute." After you have set up this client, you just return a response of "200 Ok" and set the content type to "text/event-stream." Redis will take care of the rest.
Then, another request is made to another URL endpoint which accomplishes the task of "posting a JSON object" by hitting /your-endpoint-that-processes-json. (Note: obviously this request may be made by the same user/browser...but the application doesn't know/care about that)
In this action, you do the processing of their JSON data, increment your counters, or do whatever...and return a 200 response. However, one of the things you'd do in this action is "publish" a message on the Redis channel your subscribers from step #1 are listening to so the clients get the updates. Technically, this action does not need to return anything to the client, assuming the user will have some type of feedback based on the 200-status code or on the server-sent event that is sent down the pipe...
A tangible example I can give you is this gist, which is part of this article. Note that the article is a couple years old at this point so some of the code may have to be tweaked a bit. Also note this is not guaranteed to be anything more than an example (ie: it has not been "load tested" or anything like that). However, it may help you get started.

I came up with a solution please let me know if this is the right way to do stuff ?
Will this solution work across sessions ?
Server side Code
var events = require('events');
var progressEmitter = new events.EventEmitter();
exports.cleanseMatch = function(req, res)
{
console.log('cleanseMatch Inovked');
var progressTrigger = new events.EventEmitter;
var id = '';
var i = 1;
id = setInterval(function(){
req.session.percentage = (i/10)*100;
i++;
console.log('PCT is: ' + req.session.percentage);
progressEmitter.emit('progress',req.session.percentage)
if(i == 11) {
req.session.percentage = 100;
clearInterval(id);
res.json({'data':'test'});
}
},1000);
}
exports.progress = function(req,res)
{
console.log('progress Inovked');
// console.log('PCT is: ' + req.session.percentage);
res.writeHead(200, {'Content-Type': 'text/event-stream'});
progressEmitter.on('progress',function(percentage){
console.log('progress event fired for : ' + percentage);
res.write("event: progress\n");
res.write("data: "+percentage+"\n\n");
});
}
Client Side Code
var source = new EventSource('progress');
source.addEventListener('progress', function(e) {
var percentage = JSON.parse(e.data);
//update progress bar in client
App.updateProgressBar(percentage);
}, false);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string