Caching responses in express - node.js

I have some real trouble caching responses in express… I have one endpoint that gets a lot of requests (around 5k rpm). This endpoint fetches data from mongodb and to speed things up I would like to cache the full json response for 1 second so that only the first request each second hits the database while the others are served from a cache.
When abstracting out the database part of the problem my solution looks like this. I check for a cached response in redis. If one is found I serve it. If not I generate it, send it and set the cache. The timeout is too simulate the database operation.
app.get('/cachedTimeout', function(req,res,next) {
redis.get(req.originalUrl, function(err, value) {
if (err) return next(err);
if (value) {
res.set('Content-Type', 'text/plain');
res.send(value.toString());
} else {
setTimeout(function() {
res.send('OK');
redis.set(req.originalUrl, 'OK');
redis.expire(req.originalUrl, 1);
}, 100);
}
});
});
The problem is that this will not only make the first request every second hit the database. Instead all requests that comes in before we had time to set the cache (before 100ms) will hit the database. When adding real load to this it really blows up with response times around 60 seconds because a lot of requests are getting behind.
I know this could be solved with a reverse proxy like varnish but currently we are hosting on heroku which complicates such a setup.
What I would like to do is to do some sort of reverse-proxy cache inside of express. I would like it so that all the requests that comes in after the initial request (that generates the cache) would wait for the cache generation to finish before using that same response.
Is this possible?

Use a proxy layer on top your node.js application. Vanish Cache would be a good choice
to work with Nginx to serve your application.

p-throttle should do exactly what you need: https://www.npmjs.com/package/p-throttle

Related

React app with Server-side rendering crashes with load

I'm using react-boilerplate (with react-router, sagas, express.js) for my React app and on top of it I've added SSR logic so that once it receives an HTTP request it renders react components to string based on URL and sends HTML string back to the client.
While react rendering is happening on the server side, it also makes fetch request through sagas to some APIs (up to 5 endpoints based on the URL) to get data for components before it actually renders the component to string.
Everything is working great if I make only several request to the Node server at the same time, but once I simulate load of 100+ concurrent requests and it starts processing it then at some point it crashes with no indication of any exception.
What I've noticed while I was trying to debug the app is that once 100+ incoming requests begin to be processed by the Node server it sends requests to APIs at the same time but receives no actual response until it stops stacking those requests.
The code that's used for rendering on the server side:
async function renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames }) {
// 1st render phase - triggers the sagas
renderAppToString(store, renderProps);
// send signal to sagas that we're done
store.dispatch(END);
// wait for all tasks to finish
await sagasDone();
// capture the state after the first render
const state = store.getState().toJS();
// prepare style sheet to collect generated css
const styleSheet = new ServerStyleSheet();
// 2nd render phase - the sagas triggered in the first phase are resolved by now
const appMarkup = renderAppToString(store, renderProps, styleSheet);
// capture the generated css
const css = styleSheet.getStyleElement();
const doc = renderToStaticMarkup(
<HtmlDocument
appMarkup={appMarkup}
lang={state.language.locale}
state={state}
head={Helmet.rewind()}
assets={assets}
css={css}
webpackDllNames={webpackDllNames}
/>
);
return `<!DOCTYPE html>\n${doc}`;
}
// The code that's executed by express.js for each request
function renderAppToStringAtLocation(url, { webpackDllNames = [], assets, lang }, callback) {
const memHistory = createMemoryHistory(url);
const store = createStore({}, memHistory);
syncHistoryWithStore(memHistory, store);
const routes = createRoutes(store);
const sagasDone = monitorSagas(store);
store.dispatch(changeLocale(lang));
match({ routes, location: url }, (error, redirectLocation, renderProps) => {
if (error) {
callback({ error });
} else if (renderProps) {
renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames })
.then((html) => {
callback({ html });
})
.catch((e) => callback({ error: e }));
} else {
callback({ error: new Error('Unknown error') });
}
});
}
So my assumption is that something is going wrong once it receives too many HTTP requests which in turn generates even more requests to API endpoints to render react components.
I've noticed that it blocks event loop for 300ms after renderAppToString() for every client request, so once there are 100 concurrent requests it blocks it for about 10 seconds. I'm not sure if that's a normal or bad thing though.
Is it worth trying to limit simultaneous requests to Node server?
I couldn't find much information on the topic of SSR + Node crashes. So I'd appreciate any suggestions as to where to look at to identify the problem or for possible solutions if anyone has experienced similar issue in the past.
In the above image, I am doing ReactDOM.hydrate(...) I can also load my initial and required state and send it down in hydrate.
I have written the middleware file and I am using this file to decide based on what URL i should send which file in response.
Above is my middleware file, I have created the HTML string of the whichever file was requested based on URL. Then I add this HTML string and return it using res.render of express.
Above image is where I compare the requested URL path with the dictionary of path-file associations. Once it is found (i.e. URL matches) I use ReactDOMserver render to string to convert it into HTML. This html can be used to send with handle bar file using res.render as discussed above.
This way I have managed to do SSR on my most web apps built using MERN.io stack.
Hope my answer helped you and Please write comment for discussions
1. Run express in a cluster
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems, the user will sometimes want to
launch a cluster of Node.js processes to handle the load.
As Node is single threaded the problem may also be in a file lower down the stack were you are initialising express.
There are a number of best practices when running a node app that are not generally mentioned in react threads.
A simple solution to improve performance on a server running multiple cores is to use the built in node cluster module
https://nodejs.org/api/cluster.html
This will start multiple instance of your app on each core of your server giving you a significant performance improvement (if you have a multicore server) for concurrent requests
See for more information on express performance
https://expressjs.com/en/advanced/best-practice-performance.html
You may also want to throttle you incoming connections as when the thread starts context switching response times drop rapidly this can be done by adding something like NGINX / HA Proxy in front of your application
2. Wait for the store to be hydrated before calling render to string
You don't want to have to render you layout until your store has finished updating as other comments note this is a blocks the thread while rendering.
Below is the example taken from the saga repo which shows how to run the sagas with out the need to render the template until they have all resolved
store.runSaga(rootSaga).done.then(() => {
console.log('sagas complete')
res.status(200).send(
layout(
renderToString(rootComp),
JSON.stringify(store.getState())
)
)
}).catch((e) => {
console.log(e.message)
res.status(500).send(e.message)
})
https://github.com/redux-saga/redux-saga/blob/master/examples/real-world/server.js
3. Make sure node environment is set correctly
Also ensure you are correctly using NODE_ENV=production when bundling / running your code as both express and react optimise for this
The calls to renderToString() are synchronous, so they are blocking the thread while they are running. So its no surprise that when you have 100+ concurrent requests that you have an extremely blocked up queue hanging for ~10 seconds.
Edit: It was pointed out that React v16 natively supports streaming, but you need to use the renderToNodeStream() method for streaming the HTML to the client. It should return the exact same string as renderToString() but streams it instead, so you don't have to wait for the full HTML to be rendered before you start sending data to the client.

node.js- post requests to endpoint begin to get stuck after a while

I've developed a node.js webapp with express+mongoose deployed to an Amazon EC2 instance.
The app receives SNS notifications when a file is uploaded to a specific s3 bucket, stores something in mongodb and then makes an https post to some endpoint outside amazon. https post is done in this way using requests library:
var options = {
url:"https://"+config.get('some.endpoint')+"/somepath",
method:'POST',
body:postdata,
json:true
};
requests.post(options,function(err,response,body){
if (!err && response.statusCode === 200) {
logger.info("notified ok ");
}else{
logger.error("1 " + err);
logger.error("2 " + response);
logger.error("3 " + body);
}
});
This was done using a simple callback model ( i.e I didn't use async library).
Files are uploaded continously so the SNS hits my app at the same rate ( ~5/10 requests per second). The first ten minutes of the app being up, I can see ( via checking logs) that http post are being delivered in a near speed as incoming requests arrive.
But at some point, the requests.post callback starts falling behind until it stops showing up in the log file,(despite requests keep coming). I can tell, by checking the other endpoint (the one specified in config.get('some.endpoint')) , effectively, that posts aren't being delivered. In different bursts and with great delays ( 5 min or more) , some new messages appear in the log, like if it was trying to catch up, but in the long term they stop showing up at all.
I've realized that if I make some manual flow-control by stopping/restarting the incoming requests I can make it work ok.
Am I doing something wrong? are requests getting stacked up somewhere because of some reason? How can I check this? Should I use some library to ensure execution?
Could it be that node.js prefers to process new incoming requests vs processing old requests callback and somehow these callbacks are never executed?
Any help or suggestion on how I can debug this issue is welcomed.
Thanks in advance!

Is there any kind of limit with node for I/O?

I am writing a code that is downloading one file from some place and I am streaming to the client real time. The file is never full in my sever. Only chunks. Here is the code:
downloader.getLink(link, cookies[acc], function(err, location) {
if (!err) {
downloader.downloadLink(location, cookies[acc], function(err, response) {
if (!err) {
res.writeHead(200, response.headers);
response.pipe(res);
} else
res.end(JSON.stringfy(err));
});
} else {
res.end(JSON.stringfy(err));
}
});
As I can see there is nothing blocking this code since response is comming from a simple http.response...
The problem is, this way I can only stream 6 files at the same time. But the server is not using all the resources(cpu 10%, memory 10%) and it is a single core. After +/- 5 files I only get the loading page and the stream doesn't starts, only after some of them has completed.
This is not a limitation on the 1st server where I am downloading the files because using my browser for example I can download as many as I want. Am I doing something wrog or this is some limitation in node that I can change? Thanks
If your code is using the node.js core http module's http.Agent, it has an initial limit of 5 simultaneous outgoing connections to the same remote server. Try reading substack's rant in the hyperquest README for the details. But in short, try using a different module for your connections (I recommend superagent or hyperquest), or adjust the http Agent's maxSockets setting for the node core http module.

Fetching external resources in parallel in node - good practice?

I have a setup where a node server acts as a proxy server to serve images.
For example an image "test1.jpg", the exact same image can be fetched from 3 external sources - lets say -
a. www.abc.com/test1.jpg
b. www.def.com/test1.jpg
c. www.ghi.com/test1.jpg
When the nodejs server gets a request for "test1.jpg" it first gets a list of external URLs from a DB. Now amongst these external resources, at least one is always behind a CDN and is "expected" to respond faster and hence is a preferred source for the image.
My question is what is the correct method to achieve this out of the two below (or if there is any other method)
Fire http requests (using mikeal's request client module) for all the URLs at the same time. Get their promise objects and whichever source responds first, send that image back to the user (it can be any of the three sources, not necessarily the preferred source behind the cDN - but doesnt matter since the image is exactly the same). The disadvantage that I see is that for every image we hit 3 sources. Also the promises for http requests can still get fulfilled after the response from the first successful source has been sent out.
Fire http requests one at a time starting with the most preferred image, wait for it to fail (i.e. a 404 on the image) and then proceed to the next preferred image. We have lesser number of HTTP requests but more wait time for the user.
Some pseudo code
Method 1
while(imagePreferences.length > 0) {
var url = imagePreferences.splice(0,1);
getImage(url).then(function() {
sendImage();
}, function(err) {
console.log(err);
});
}
Method 2
if(imageUrls.length > 0) {
var url = imageUrls.splice(0,1);
getImage(url).then(function(imageResp) {
sendImageResp();
}, function(err) {
getNextImage(); //recurse over this
});
}
This is just pseudo code. I am new to nodejs. Any help/views would be appreciated.
I prefer the 1st option, CDNs are designed to receive massive requests. Your code is perfectly fine to send HTTP requests to multiple sources in parallel.
In case you want to stop the other requests after successfully receiving the first image, you can use async.detect: https://github.com/caolan/async#detect

“Proxying” a lot of HTTP requests with Node.js + Express 2

I'm writing proxy in Node.js + Express 2. Proxy should:
decrypt POST payload and issue HTTP request to server based on result;
encrypt reply from server and send it back to client.
Encryption-related part works fine. The problem I'm facing is timeouts. Proxy should process requests in less than 15 secs. And most of them are under 500ms, actually.
Problem appears when I increase number of parallel requests. Most requests are completed ok, but some are failed after 15 secs + couple of millis. ab -n5000 -c300 works fine, but with concurrency of 500 it fails for some requests with timeout.
I could only speculate, but it seems thant problem is an order of callbacks exectuion. Is it possible that requests that comes first are hanging until ETIMEDOUT because of node's focus in latest ones which are still being processed in time under 500ms.
P.S.: There is no problem with remote server. I'm using request for interactions with it.
upd
The way things works with some code:
function queryRemote(req, res) {
var options = {}; // built based on req object (URI, body, authorization, etc.)
request(options, function(err, httpResponse, body) {
return err ? send500(req, res)
: res.end(encrypt(body));
});
}
app.use(myBodyParser); // reads hex string in payload
// and calls next() on 'end' event
app.post('/', [checkHeaders, // check Content-Type and Authorization headers
authUser, // query DB and call next()
parseRequest], // decrypt payload, parse JSON, call next()
function(req, res) {
req.socket.setTimeout(TIMEOUT);
queryRemote(req, res);
});
My problem is following: when ab issuing, let's say, 20 POSTs to /, express route handler gets called like thousands of times. That's not always happening, sometimes 20 and only 20 requests are processed in timely fashion.
Of course, ab is not a problem. I'm 100% sure that only 20 requests sent by ab. But route handler gets called multiple times.
I can't find reasons for such behaviour, any advice?
Timeouts were caused by using http.globalAgent which by default can process up to 5 concurrent requests to one host:port (which isn't enough in my case).
Thouthands of requests (instead of tens) were sent by ab (Wireshark approved fact under OS X; I can not reproduce this under Ubuntu inside Parallels).
You can have a look at node-http-proxy module and how it handles the connections. Make sure you don't buffer any data and everything works by streaming. And you should try to see where is the time spent for those long requests. Try instrumenting parts of your code with conosle.time and console.timeEnd and see where is taking the most time. If the time is mostly spent in javascript you should try to profile it. Basically you can use v8 profiler, by adding --prof option to your node command. Which makes a v8.log and can be processed via a v8 tool found in node-source-dir/deps/v8/tools. It only works if you have installed d8 shell via scons(scons d8). You can have a look at this article to help you further to make this working.
You can also use node-webkit-agent which uses webkit developer tools to show the profiler result. You can also have a look at my fork with a bit of sugar.
If that didn't work, you can try profiling with dtrace(only works in illumos-based systems like SmartOS).

Resources