Fetching external resources in parallel in node - good practice? - node.js

I have a setup where a node server acts as a proxy server to serve images.
For example an image "test1.jpg", the exact same image can be fetched from 3 external sources - lets say -
a. www.abc.com/test1.jpg
b. www.def.com/test1.jpg
c. www.ghi.com/test1.jpg
When the nodejs server gets a request for "test1.jpg" it first gets a list of external URLs from a DB. Now amongst these external resources, at least one is always behind a CDN and is "expected" to respond faster and hence is a preferred source for the image.
My question is what is the correct method to achieve this out of the two below (or if there is any other method)
Fire http requests (using mikeal's request client module) for all the URLs at the same time. Get their promise objects and whichever source responds first, send that image back to the user (it can be any of the three sources, not necessarily the preferred source behind the cDN - but doesnt matter since the image is exactly the same). The disadvantage that I see is that for every image we hit 3 sources. Also the promises for http requests can still get fulfilled after the response from the first successful source has been sent out.
Fire http requests one at a time starting with the most preferred image, wait for it to fail (i.e. a 404 on the image) and then proceed to the next preferred image. We have lesser number of HTTP requests but more wait time for the user.
Some pseudo code
Method 1
while(imagePreferences.length > 0) {
var url = imagePreferences.splice(0,1);
getImage(url).then(function() {
sendImage();
}, function(err) {
console.log(err);
});
}
Method 2
if(imageUrls.length > 0) {
var url = imageUrls.splice(0,1);
getImage(url).then(function(imageResp) {
sendImageResp();
}, function(err) {
getNextImage(); //recurse over this
});
}
This is just pseudo code. I am new to nodejs. Any help/views would be appreciated.

I prefer the 1st option, CDNs are designed to receive massive requests. Your code is perfectly fine to send HTTP requests to multiple sources in parallel.
In case you want to stop the other requests after successfully receiving the first image, you can use async.detect: https://github.com/caolan/async#detect

Related

How does NodeJS process multiple GET requests from different users/browsers?

I'd like to know how does NodeJS process multiple GET requests from different users/browsers which have event emitted to return the results? I'd like to think of it as each time a user executes the GET request, it's as if a new session is started for that user.
For example if I have this GET request
var tester = require('./tester-class');
app.get('/triggerEv', async function(req, res, next) {
// Start the data processing
tester.startProcessing('some-data');
// tester has event emitters that are triggered when processing is complete (success or fail)
tester.on('success', function(data) {
return res.send('success');
}
tester.on('fail', function(data) {
return res.send('fail');
}
}
What I'm thinking is that if I open a browser and run this GET request by passing some-data and start processing. Then open another browser to execute this GET request with different data (to simulate multiple users accessing it at the same time), it will overwrite the previous startProcessing function and rerun it again with the new data.
So if multiple users execute this GET request at the same time, would it handle it separately for each user as if it was different and independent sessions then return when there's a response for each user's sessions? Or will it do as I mentioned above (this case I will have to somehow manage different sessions for each user that triggers this GET request)?
I want to make it so that each user that executes this GET request doesn't interfere with other users that also execute this GET request at the same time and the correct response is returned for each user based on their own data sent to the startProcessing function.
Thanks, I hope I'm making sense. Will clarify if not.
If you're sharing the global tester object among different requests, then the 2nd request will interfere with the first request. Since all incoming requests use the same global environment in node.js, the usual model is that any request that may be "in-flight" for awhile needs to create its own resources and keep them for itself. Then, if some other request arrives while the first one is still waiting for something to complete, then it will also create its own resources and the two will not conflict.
The server environment does not have a concept of "sessions" in the way you're using the term. There is no separate server-session or server state that each request lives in other than the request and response objects that are created for each incoming request. This is not like PHP - there is not a whole new interpreter state for each request.
I want to make it so that each user that executes this GET request doesn't interfere with other users that also execute this GET request at the same time and the correct response is returned for each user based on their own data sent to the startProcessing function.
Then, don't share any resources between requests and don't use any objects that have global state. I don't know what your tester is, but one way to keep multiple requests separate from each other is to just make a new tester object for each request so they can each use it to their heart's content without any conflict.

React app with Server-side rendering crashes with load

I'm using react-boilerplate (with react-router, sagas, express.js) for my React app and on top of it I've added SSR logic so that once it receives an HTTP request it renders react components to string based on URL and sends HTML string back to the client.
While react rendering is happening on the server side, it also makes fetch request through sagas to some APIs (up to 5 endpoints based on the URL) to get data for components before it actually renders the component to string.
Everything is working great if I make only several request to the Node server at the same time, but once I simulate load of 100+ concurrent requests and it starts processing it then at some point it crashes with no indication of any exception.
What I've noticed while I was trying to debug the app is that once 100+ incoming requests begin to be processed by the Node server it sends requests to APIs at the same time but receives no actual response until it stops stacking those requests.
The code that's used for rendering on the server side:
async function renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames }) {
// 1st render phase - triggers the sagas
renderAppToString(store, renderProps);
// send signal to sagas that we're done
store.dispatch(END);
// wait for all tasks to finish
await sagasDone();
// capture the state after the first render
const state = store.getState().toJS();
// prepare style sheet to collect generated css
const styleSheet = new ServerStyleSheet();
// 2nd render phase - the sagas triggered in the first phase are resolved by now
const appMarkup = renderAppToString(store, renderProps, styleSheet);
// capture the generated css
const css = styleSheet.getStyleElement();
const doc = renderToStaticMarkup(
<HtmlDocument
appMarkup={appMarkup}
lang={state.language.locale}
state={state}
head={Helmet.rewind()}
assets={assets}
css={css}
webpackDllNames={webpackDllNames}
/>
);
return `<!DOCTYPE html>\n${doc}`;
}
// The code that's executed by express.js for each request
function renderAppToStringAtLocation(url, { webpackDllNames = [], assets, lang }, callback) {
const memHistory = createMemoryHistory(url);
const store = createStore({}, memHistory);
syncHistoryWithStore(memHistory, store);
const routes = createRoutes(store);
const sagasDone = monitorSagas(store);
store.dispatch(changeLocale(lang));
match({ routes, location: url }, (error, redirectLocation, renderProps) => {
if (error) {
callback({ error });
} else if (renderProps) {
renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames })
.then((html) => {
callback({ html });
})
.catch((e) => callback({ error: e }));
} else {
callback({ error: new Error('Unknown error') });
}
});
}
So my assumption is that something is going wrong once it receives too many HTTP requests which in turn generates even more requests to API endpoints to render react components.
I've noticed that it blocks event loop for 300ms after renderAppToString() for every client request, so once there are 100 concurrent requests it blocks it for about 10 seconds. I'm not sure if that's a normal or bad thing though.
Is it worth trying to limit simultaneous requests to Node server?
I couldn't find much information on the topic of SSR + Node crashes. So I'd appreciate any suggestions as to where to look at to identify the problem or for possible solutions if anyone has experienced similar issue in the past.
In the above image, I am doing ReactDOM.hydrate(...) I can also load my initial and required state and send it down in hydrate.
I have written the middleware file and I am using this file to decide based on what URL i should send which file in response.
Above is my middleware file, I have created the HTML string of the whichever file was requested based on URL. Then I add this HTML string and return it using res.render of express.
Above image is where I compare the requested URL path with the dictionary of path-file associations. Once it is found (i.e. URL matches) I use ReactDOMserver render to string to convert it into HTML. This html can be used to send with handle bar file using res.render as discussed above.
This way I have managed to do SSR on my most web apps built using MERN.io stack.
Hope my answer helped you and Please write comment for discussions
1. Run express in a cluster
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems, the user will sometimes want to
launch a cluster of Node.js processes to handle the load.
As Node is single threaded the problem may also be in a file lower down the stack were you are initialising express.
There are a number of best practices when running a node app that are not generally mentioned in react threads.
A simple solution to improve performance on a server running multiple cores is to use the built in node cluster module
https://nodejs.org/api/cluster.html
This will start multiple instance of your app on each core of your server giving you a significant performance improvement (if you have a multicore server) for concurrent requests
See for more information on express performance
https://expressjs.com/en/advanced/best-practice-performance.html
You may also want to throttle you incoming connections as when the thread starts context switching response times drop rapidly this can be done by adding something like NGINX / HA Proxy in front of your application
2. Wait for the store to be hydrated before calling render to string
You don't want to have to render you layout until your store has finished updating as other comments note this is a blocks the thread while rendering.
Below is the example taken from the saga repo which shows how to run the sagas with out the need to render the template until they have all resolved
store.runSaga(rootSaga).done.then(() => {
console.log('sagas complete')
res.status(200).send(
layout(
renderToString(rootComp),
JSON.stringify(store.getState())
)
)
}).catch((e) => {
console.log(e.message)
res.status(500).send(e.message)
})
https://github.com/redux-saga/redux-saga/blob/master/examples/real-world/server.js
3. Make sure node environment is set correctly
Also ensure you are correctly using NODE_ENV=production when bundling / running your code as both express and react optimise for this
The calls to renderToString() are synchronous, so they are blocking the thread while they are running. So its no surprise that when you have 100+ concurrent requests that you have an extremely blocked up queue hanging for ~10 seconds.
Edit: It was pointed out that React v16 natively supports streaming, but you need to use the renderToNodeStream() method for streaming the HTML to the client. It should return the exact same string as renderToString() but streams it instead, so you don't have to wait for the full HTML to be rendered before you start sending data to the client.

node.js- post requests to endpoint begin to get stuck after a while

I've developed a node.js webapp with express+mongoose deployed to an Amazon EC2 instance.
The app receives SNS notifications when a file is uploaded to a specific s3 bucket, stores something in mongodb and then makes an https post to some endpoint outside amazon. https post is done in this way using requests library:
var options = {
url:"https://"+config.get('some.endpoint')+"/somepath",
method:'POST',
body:postdata,
json:true
};
requests.post(options,function(err,response,body){
if (!err && response.statusCode === 200) {
logger.info("notified ok ");
}else{
logger.error("1 " + err);
logger.error("2 " + response);
logger.error("3 " + body);
}
});
This was done using a simple callback model ( i.e I didn't use async library).
Files are uploaded continously so the SNS hits my app at the same rate ( ~5/10 requests per second). The first ten minutes of the app being up, I can see ( via checking logs) that http post are being delivered in a near speed as incoming requests arrive.
But at some point, the requests.post callback starts falling behind until it stops showing up in the log file,(despite requests keep coming). I can tell, by checking the other endpoint (the one specified in config.get('some.endpoint')) , effectively, that posts aren't being delivered. In different bursts and with great delays ( 5 min or more) , some new messages appear in the log, like if it was trying to catch up, but in the long term they stop showing up at all.
I've realized that if I make some manual flow-control by stopping/restarting the incoming requests I can make it work ok.
Am I doing something wrong? are requests getting stacked up somewhere because of some reason? How can I check this? Should I use some library to ensure execution?
Could it be that node.js prefers to process new incoming requests vs processing old requests callback and somehow these callbacks are never executed?
Any help or suggestion on how I can debug this issue is welcomed.
Thanks in advance!

Caching responses in express

I have some real trouble caching responses in express… I have one endpoint that gets a lot of requests (around 5k rpm). This endpoint fetches data from mongodb and to speed things up I would like to cache the full json response for 1 second so that only the first request each second hits the database while the others are served from a cache.
When abstracting out the database part of the problem my solution looks like this. I check for a cached response in redis. If one is found I serve it. If not I generate it, send it and set the cache. The timeout is too simulate the database operation.
app.get('/cachedTimeout', function(req,res,next) {
redis.get(req.originalUrl, function(err, value) {
if (err) return next(err);
if (value) {
res.set('Content-Type', 'text/plain');
res.send(value.toString());
} else {
setTimeout(function() {
res.send('OK');
redis.set(req.originalUrl, 'OK');
redis.expire(req.originalUrl, 1);
}, 100);
}
});
});
The problem is that this will not only make the first request every second hit the database. Instead all requests that comes in before we had time to set the cache (before 100ms) will hit the database. When adding real load to this it really blows up with response times around 60 seconds because a lot of requests are getting behind.
I know this could be solved with a reverse proxy like varnish but currently we are hosting on heroku which complicates such a setup.
What I would like to do is to do some sort of reverse-proxy cache inside of express. I would like it so that all the requests that comes in after the initial request (that generates the cache) would wait for the cache generation to finish before using that same response.
Is this possible?
Use a proxy layer on top your node.js application. Vanish Cache would be a good choice
to work with Nginx to serve your application.
p-throttle should do exactly what you need: https://www.npmjs.com/package/p-throttle

“Proxying” a lot of HTTP requests with Node.js + Express 2

I'm writing proxy in Node.js + Express 2. Proxy should:
decrypt POST payload and issue HTTP request to server based on result;
encrypt reply from server and send it back to client.
Encryption-related part works fine. The problem I'm facing is timeouts. Proxy should process requests in less than 15 secs. And most of them are under 500ms, actually.
Problem appears when I increase number of parallel requests. Most requests are completed ok, but some are failed after 15 secs + couple of millis. ab -n5000 -c300 works fine, but with concurrency of 500 it fails for some requests with timeout.
I could only speculate, but it seems thant problem is an order of callbacks exectuion. Is it possible that requests that comes first are hanging until ETIMEDOUT because of node's focus in latest ones which are still being processed in time under 500ms.
P.S.: There is no problem with remote server. I'm using request for interactions with it.
upd
The way things works with some code:
function queryRemote(req, res) {
var options = {}; // built based on req object (URI, body, authorization, etc.)
request(options, function(err, httpResponse, body) {
return err ? send500(req, res)
: res.end(encrypt(body));
});
}
app.use(myBodyParser); // reads hex string in payload
// and calls next() on 'end' event
app.post('/', [checkHeaders, // check Content-Type and Authorization headers
authUser, // query DB and call next()
parseRequest], // decrypt payload, parse JSON, call next()
function(req, res) {
req.socket.setTimeout(TIMEOUT);
queryRemote(req, res);
});
My problem is following: when ab issuing, let's say, 20 POSTs to /, express route handler gets called like thousands of times. That's not always happening, sometimes 20 and only 20 requests are processed in timely fashion.
Of course, ab is not a problem. I'm 100% sure that only 20 requests sent by ab. But route handler gets called multiple times.
I can't find reasons for such behaviour, any advice?
Timeouts were caused by using http.globalAgent which by default can process up to 5 concurrent requests to one host:port (which isn't enough in my case).
Thouthands of requests (instead of tens) were sent by ab (Wireshark approved fact under OS X; I can not reproduce this under Ubuntu inside Parallels).
You can have a look at node-http-proxy module and how it handles the connections. Make sure you don't buffer any data and everything works by streaming. And you should try to see where is the time spent for those long requests. Try instrumenting parts of your code with conosle.time and console.timeEnd and see where is taking the most time. If the time is mostly spent in javascript you should try to profile it. Basically you can use v8 profiler, by adding --prof option to your node command. Which makes a v8.log and can be processed via a v8 tool found in node-source-dir/deps/v8/tools. It only works if you have installed d8 shell via scons(scons d8). You can have a look at this article to help you further to make this working.
You can also use node-webkit-agent which uses webkit developer tools to show the profiler result. You can also have a look at my fork with a bit of sugar.
If that didn't work, you can try profiling with dtrace(only works in illumos-based systems like SmartOS).

Resources