Streaming large files causing server to hang - node.js

I have a feature in my web app that allows users to upload and download files. I serve up the app with Express, but the files are stored in a different server, so I proxy the requests to that server. Here's the proxy code using the request library:
module.exports = function(req, res) {
req.headers['x-private-id'] = getId();
var url = rewriteUrl(req.url);
var newRequest = request(url, function(error) {
if (error) console.log(error);
});
req.pipe(newRequest).on('response', function(res) {
delete res.headers['x-private-id'];
}).pipe(res);
};
This works fine for all of my requests, including downloading the file. However, I run into issues when 'streaming' the file. And by streaming, I mean I use fancybox to display the video using a video tag. The video displays fine the first few times.
But if I close fancybox and then reopen it enough times (5 specifically), it quits working after that; the video no longer shows up. The entire Express server seems to hang, unable to process any more requests. If I restart the server, everything is OK. To me it seems like the sockets from the proxy requests aren't being closed properly, but I can't figure out why. Is there something wrong with my proxy code?

You need to either increase the pool.maxSockets value passed in the request() config since it defaults to node's HTTP Agent's maxSockets which is 5, or opt out of connection pooling altogether with pool: false in the request() config.

Related

Expressjs: SocketIO connection closed when using res.download

I used expressjs to handle download requests of files in a chat application. Also I used socketio to handle chat messages. The problem is that when a download request come to server the socket connection disconnect until reconnection period.
The socket io connection config:
const io = new Server(server,{
cors: {
origin: "*",
methods: ["GET", "POST"]
},
transport : ['websocket']
});
and also the reequest handler of downloads
const file = './public/uploads/' + req.params.filename;
res.type("application/octet-stream");
res.attachment(file);
res.download(file, originalName, function (err) {
if (err) {
res.sendStatus(400);
}
});
I ran into something like this recently an was hoping to see some responses to your question.
I figured I would respond with what I have found and a solution that has worked for me.
In my situation there are download links which call a express app get handler directly that responds with the file (CSV in my case). Since this is a direct handler the path is not setup as a "static file" path. In my case I believe the main issue is that this path not set as a static directory.
Im not sure of how your application works and how the file download is initiated from your client.
For me it was as simple as putting target="_blank" in the file download anchor link. This allowed the file download to be completed in a new window without making the socket think I was redirecting.
Chrome handles it pretty well, new tab pops, file downloads and tab closes.

React app with Server-side rendering crashes with load

I'm using react-boilerplate (with react-router, sagas, express.js) for my React app and on top of it I've added SSR logic so that once it receives an HTTP request it renders react components to string based on URL and sends HTML string back to the client.
While react rendering is happening on the server side, it also makes fetch request through sagas to some APIs (up to 5 endpoints based on the URL) to get data for components before it actually renders the component to string.
Everything is working great if I make only several request to the Node server at the same time, but once I simulate load of 100+ concurrent requests and it starts processing it then at some point it crashes with no indication of any exception.
What I've noticed while I was trying to debug the app is that once 100+ incoming requests begin to be processed by the Node server it sends requests to APIs at the same time but receives no actual response until it stops stacking those requests.
The code that's used for rendering on the server side:
async function renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames }) {
// 1st render phase - triggers the sagas
renderAppToString(store, renderProps);
// send signal to sagas that we're done
store.dispatch(END);
// wait for all tasks to finish
await sagasDone();
// capture the state after the first render
const state = store.getState().toJS();
// prepare style sheet to collect generated css
const styleSheet = new ServerStyleSheet();
// 2nd render phase - the sagas triggered in the first phase are resolved by now
const appMarkup = renderAppToString(store, renderProps, styleSheet);
// capture the generated css
const css = styleSheet.getStyleElement();
const doc = renderToStaticMarkup(
<HtmlDocument
appMarkup={appMarkup}
lang={state.language.locale}
state={state}
head={Helmet.rewind()}
assets={assets}
css={css}
webpackDllNames={webpackDllNames}
/>
);
return `<!DOCTYPE html>\n${doc}`;
}
// The code that's executed by express.js for each request
function renderAppToStringAtLocation(url, { webpackDllNames = [], assets, lang }, callback) {
const memHistory = createMemoryHistory(url);
const store = createStore({}, memHistory);
syncHistoryWithStore(memHistory, store);
const routes = createRoutes(store);
const sagasDone = monitorSagas(store);
store.dispatch(changeLocale(lang));
match({ routes, location: url }, (error, redirectLocation, renderProps) => {
if (error) {
callback({ error });
} else if (renderProps) {
renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames })
.then((html) => {
callback({ html });
})
.catch((e) => callback({ error: e }));
} else {
callback({ error: new Error('Unknown error') });
}
});
}
So my assumption is that something is going wrong once it receives too many HTTP requests which in turn generates even more requests to API endpoints to render react components.
I've noticed that it blocks event loop for 300ms after renderAppToString() for every client request, so once there are 100 concurrent requests it blocks it for about 10 seconds. I'm not sure if that's a normal or bad thing though.
Is it worth trying to limit simultaneous requests to Node server?
I couldn't find much information on the topic of SSR + Node crashes. So I'd appreciate any suggestions as to where to look at to identify the problem or for possible solutions if anyone has experienced similar issue in the past.
In the above image, I am doing ReactDOM.hydrate(...) I can also load my initial and required state and send it down in hydrate.
I have written the middleware file and I am using this file to decide based on what URL i should send which file in response.
Above is my middleware file, I have created the HTML string of the whichever file was requested based on URL. Then I add this HTML string and return it using res.render of express.
Above image is where I compare the requested URL path with the dictionary of path-file associations. Once it is found (i.e. URL matches) I use ReactDOMserver render to string to convert it into HTML. This html can be used to send with handle bar file using res.render as discussed above.
This way I have managed to do SSR on my most web apps built using MERN.io stack.
Hope my answer helped you and Please write comment for discussions
1. Run express in a cluster
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems, the user will sometimes want to
launch a cluster of Node.js processes to handle the load.
As Node is single threaded the problem may also be in a file lower down the stack were you are initialising express.
There are a number of best practices when running a node app that are not generally mentioned in react threads.
A simple solution to improve performance on a server running multiple cores is to use the built in node cluster module
https://nodejs.org/api/cluster.html
This will start multiple instance of your app on each core of your server giving you a significant performance improvement (if you have a multicore server) for concurrent requests
See for more information on express performance
https://expressjs.com/en/advanced/best-practice-performance.html
You may also want to throttle you incoming connections as when the thread starts context switching response times drop rapidly this can be done by adding something like NGINX / HA Proxy in front of your application
2. Wait for the store to be hydrated before calling render to string
You don't want to have to render you layout until your store has finished updating as other comments note this is a blocks the thread while rendering.
Below is the example taken from the saga repo which shows how to run the sagas with out the need to render the template until they have all resolved
store.runSaga(rootSaga).done.then(() => {
console.log('sagas complete')
res.status(200).send(
layout(
renderToString(rootComp),
JSON.stringify(store.getState())
)
)
}).catch((e) => {
console.log(e.message)
res.status(500).send(e.message)
})
https://github.com/redux-saga/redux-saga/blob/master/examples/real-world/server.js
3. Make sure node environment is set correctly
Also ensure you are correctly using NODE_ENV=production when bundling / running your code as both express and react optimise for this
The calls to renderToString() are synchronous, so they are blocking the thread while they are running. So its no surprise that when you have 100+ concurrent requests that you have an extremely blocked up queue hanging for ~10 seconds.
Edit: It was pointed out that React v16 natively supports streaming, but you need to use the renderToNodeStream() method for streaming the HTML to the client. It should return the exact same string as renderToString() but streams it instead, so you don't have to wait for the full HTML to be rendered before you start sending data to the client.

respond to each request without having to wait until current stream is finished

I'm testing streaming by creating a basic node.js app code that basically streams a file to the response. Using code from here and here.
But If I make a request from http://127.0.0.1:8000/, then open another browser and request another file, the second file will not start to download until the first one is finished. In my example I created a 1GB file. dd if=/dev/zero of=file.dat bs=1G count=1
But if I request three more files while the first one is downloading, the three files will start downloading simultaneously once the first file has finished.
How can I change the code so that it will respond to each request as it's made and not have to wait for the current download to finish?
var http = require('http');
var fs = require('fs');
var i = 1;
http.createServer(function(req, res) {
console.log('starting #' + i++);
// This line opens the file as a readable stream
var readStream = fs.createReadStream('file.dat', { bufferSize: 64 * 1024 });
// This will wait until we know the readable stream is actually valid before piping
readStream.on('open', function () {
console.log('open');
// This just pipes the read stream to the response object (which goes to the client)
readStream.pipe(res);
});
// This catches any errors that happen while creating the readable stream (usually invalid names)
readStream.on('error', function(err) {
res.end(err);
});
}).listen(8000);
console.log('Server running at http://127.0.0.1:8000/');
Your code seems fine the way it is.
I checked it with node v0.10.3 by making a few requests in multiple term sessions:
$ wget http://127.0.0.1:8000
Two requests ran concurrently.
I get the same result when using two different browsers (i.e. Chrome & Safari).
Further, I can get concurrent downloads in Chrome by just changing the request url slightly, as in:
http://localhost:8000/foo
and
http://localhost:8000/bar
The behavior you describe seems to manifest when making multiple requests from the same browser for the same url.
This may be a browser limitation - it looks like the second request isn't even made until the first is completed or cancelled.
To answer your question, if you need multiple client downloads in a browser
Ensure that your server code is implemented such that file-to-url mapping is 1-to-many (i.e. using a wildcard).
Ensure your client code (i.e. javascript in the browser), uses a different url for each request.

How to inform a NodeJS server of something using PHP?

I'd like to add a live functionality to a PHP based forum - new posts would be automatically shown to users as soon as they are created.
What I find a bit confusing is the interaction between the PHP code and NodeJS+socket.io.
How would I go about informing the NodeJS server about new posts and have the server inform the clients that are watching the thread in which the post was posted?
Edit
Tried the following code, and it seems to work, my only question is whether this is considered a good solution, as it looks kind of messy to me.
I use socket.io to listen on port 81 to clients, and the server running om port 82 is only intended to be used by the forum - when a new post is created, a PHP script sends a POST request to localhost on port 82, along with the data.
Is this ok?
var io = require('socket.io').listen(81);
io.sockets.on('connection', function(socket) {
socket.on('init', function(threadid) {
socket.join(threadid);
});
});
var forumserver = require('http').createServer(function(req, res) {
if (res.socket.remoteAddress == '127.0.0.1' && req.method == 'POST') {
req.on('data', function(chunk) {
data = JSON.parse(chunk.toString());
io.sockets.in(data.threadid).emit('new-post', data.content);
});
}
res.end();
}).listen(82);
Your solution of a HTTP server running on a special port is exactly the solution I ended up with when faced with a similar problem. The PHP app simply uses curl to POST to the Node server, which then pushes a message out to socket.io.
However, your HTTP server implementation is broken. The data event is a Stream event; Streams do not emit messages, they emit chunks of data. In other words, the request entity data may be split up and emitted in two chunks.
If the data event emitted a partial chunk of data, JSON.parse would almost assuredly throw an exception, and your Node server would crash.
You either need to manually buffer data, or (my recommendation) use a more robust framework for your HTTP server like Express:
var express = require('express'), forumserver = express();
forumserver.use(express.bodyParser()); // handles buffering and parsing of the
// request entity for you
forumserver.post('/post/:threadid', function(req, res) {
io.sockets.in(req.params.threadid).emit('new-post', req.body.content);
res.send(204); // HTTP 204 No Content (empty response)
});
forumserver.listen(82);
PHP simply needs to post to http​://localhost:82/post/1234 with an entity body containing content. (JSON, URL-encoded, or multipart-encoded entities are acceptable.) Make sure your firewall blocks port 82 on your public interface.
Regarding the PHP code / forum's interaction with Node.JS, you probably need to create an API endpoint of sorts that can listen for changes made to the forum. Depending on your forum software, you would want to hook into the process of creating a new post and perform the API callback to Node.js at this time.
Socket.io out of the box is geared towards visitors of the site being connected on the frontend via Javascript. Upon the Node server receiving notification of a new post update, it would then notify connected clients of this new post and its details, at which point it would probably add new HTML to the DOM of the page the visitor is viewing.
You may want to arrange the Socket.io part of things so that users only subscribe to specific events being emitted by them being in a specific room such as "subforum123" so that they only receive notifications of applicable posts.

Sending an http response outside of the route function?

So, I have a route function like the following:
var http = require('http').createServer(start);
function start(req, res){
//routing stuff
}
and below that,I have a socket.io event listener:
io.sockets.on('connection', function(socket){
socket.on('event', function(data){
//perform an http response
}
}
When the socket event 'event' is called, I would like to perform an http response like the following:
res.writeHead(200, {'Content-disposition': 'attachment; filename=file.zip'})
res.writeHead(200, { 'Content-Type': 'application/zip' });
var filestream = fs.createReadStream('file.zip');
filestream.on('data', function(chunk) {
res.write(chunk);
});
filestream.on('end', function() {
res.end();
});
This last part, when performed within the routing function works just fine, but unfortunately when it is called from the socket event, it of course does not work, because it has no reference to the 'req' or 'res' objects. How would I go about doing this? Thanks.
Hmmm... interesting problem:
It's not impossible to do something like what you're trying to do, the flow would be something like this:
Receive http request, don't respond, keep res object saved somewhere.
Receive websocket request, do your auth/"link" it to the res object saved earlier.
Respond with file via res.
BUT it's not very pretty for a few reasons:
You need to keep res objects saved, if your server restarts a whole bunch of response objects are lost.
You need to figure out how to link websocket clients to http request clients. You could do something with cookies/localstorage to do this, I think.
Scaling to another server will become a lot harder / will you proxy clients to always be served by the same server somehow? Otherwise the linking will get harder.
I would propose a different solution for you: You want to do some client/server steps using websockets before you let someone download a file?
This question has a solution to do downloads via websocket: receive file via websocket and initiate download dialog
Sounds like it won't work on older browsers / IE, but a nice option.
Also mentions downloading via hidden iframe
Check here whether this solution is cross-browser enough for you: http://caniuse.com/#feat=datauri
Another option would be to generate a unique URL for the download, and only append it to the browser's window (either as a hidden iframe download, or as a simple download button) once you've done your logic via websocket. This option would be more available cross-browser and easier to code.

Resources