Node.js express correct use of bodyParser middleware - node.js

I am new to node.js and express and have been experimenting with them for a while. Now I am confused with the design of the express framework related to parsing the request body.
From the official guide of express:
app.use(express.bodyParser());
app.use(express.methodOverride());
app.use(app.router);
app.use(logErrors);
app.use(clientErrorHandler);
app.use(errorHandler);
After setting up all the middleware, then we add the route that we want to handle:
app.post('/test', function(req, res){
//do something with req.body
});
The problem with this approach is that all request body will be parsed first before the route validity is checked. It seems very inefficient to parse the body of invalid requests. And even more, if we enable the upload processing:
app.use(express.bodyParser({uploadDir: '/temp_dir'}));
Any client can bombard the server by uploading any files (by sending request to ANY route/path!!), all which will be processed and kept in the '/temp_dir'. I can't believe that this default method is being widely promoted!
We can of course use the bodyParser function when defining the route:
app.post('/test1', bodyParser, routeHandler1);
app.post('/test2', bodyParser, routeHandler2);
or even perhaps parse the body in each function that handle the route. However, this is tedious to do.
Is there any better way to use express.bodyParser for all valid (defined) routes only, and to use the file upload handling capability only on selected routes, without having a lot of code repetitions?

Your second method is fine. Remember you can also pass arrays of middleware functions to app.post, app.get and friends. So you can define an array called uploadMiddleware with your things that handle POST bodies, uploads, etc, and use that.
app.post('/test1', uploadMiddleware, routeHandler1);
The examples are for beginners. Beginner code to help you get the damn thing working on day 1 and production code that is efficient and secure are often very different. You make a certainly valid point about not accepting uploads to arbitrary paths. As to parsing all request bodies being 'very inefficient', that depends on the ratio of invalid/attack POST requests to legitimate requests that are sent to your application. The average background radiation of attack probe requests is probably not enough to worry about until your site starts to get popular.
Also here's a blog post with further details of the security considerations of bodyParser.

Related

ExpressJS Middleware bodyParser has very poor performance

Few days ago we added NewRelic, APM to our Rest API, which is written in NodeJS and uses EXPRESS JS as a development framework.
Now we see a lot of users experience poor response times, because of JSON parser middleware.
Here is one of those requests reported in NewRelic:
Drilled-down report:
As you can see the most of the time is consumed by middleware JSON Parser.
We were thinking maybe issue comes from big JSON payloads, which is sometimes sent out from API. But for this response, illustrated above, API returned data with contentLength=598, which shouldn't be too huge JSON.
We also use compression middleware as visible on drilled down request execution screenshot. Which should be reducing size of IO sent back and forth to clients.
At this moment we have a doubt for a parameter limit which is passed to middleware when initialized. {limit:50kb} But when testing locally it doesn't make any difference.
We were thinking to switch with protobuf, also think about the way to parse JSON payloads asynchronously. Because JSON.parse which is used by middleware is synchronous process and stops non-blocking IO.
But before staring those changes and experiments, please if anyone had same kind of problem to suggest any possible solution.
Benchmarking:
For benchmarking on local/stage environments we use JMeter and generate loads to check when such timeouts may happen, but we are not able to catch this when testing with JMeter.
Thank You.
Express comes with an embedded bodyparser, and you can try it if you want. It should perform better since it's integrated.
const express = require('express');
const app = express();
app.use(express.urlencoded()); // support for GET
// No limited usage,
app.use(express.json()); // support for POST
// If you want add request limit,
app.use(express.json({ limit: "1mb" }));

Using res.locals in node.js model file

I am overall clueless about how and why you set up a node.js app, and how any of the app.use functions work - the tutorials on it don't explain the why of anything.
Anyway, I have socket.io, res.locals and index.js set up like so in the app.js root file.
const sockets = require('./models/socket')(io)
app.use(function (req, res, next) {
res.locals.user_id = req.session.user_id;
next();
});
const routes = require('./routes/index');
app.use('/', routes);
I'd like to be able to access res.locals in the socket.js model, like I can in index.js found in the routes folder.
I can't guess how to go about doing this. If anybody is able to explain how and why I can or can't that would be a bonus. Thanks!
Welcome to Expressjs, there are a few fundamentals you should probably research before going any further, they'll help solve some of your confusion. I'll give a brief explanation of them but I suggest you do further research. I'll then answer your actual question at the end.
Middleware and app.use
Expressjs is built upon an idea that everything is just "middleware". Middleware is a function which runs as part of a request chain. A request chain is essentially a single client request, which then goes through a chain of a number of middleware functions until it either reaches the end of the chain, exits early by returning a response to the client, or errors.
Express middleware is a function which takes the following three arguments.
req (request) - Representing the request made by a client to your
server.
res (response) - Representing the response you will return to
the client.
next - A way of telling express that your current
middleware function is done, and it should now call the next piece of
middleware. This can either be called "empty" as next(); or with an
error next(new Error());. If it is called empty, it will trigger
the next piece of middleware, if it is called with an error then it
will call the first piece of error middleware. If next is not called at the
end of a piece of middleware, then the request is deemed finished and the
response object is sent to the user.
app.use is a way of setting middleware, this means it will run for every request (unless next() is either not called by the previous piece of middleware for some reason, or it's called with an error). This middleware will run for any HTTP request type (GET, POST, PUT, DELETE, etc).
app.use can take multiple arguments, the important ones for beginners to learn are: app.use(func) and app.use(path, func). The former sets "global" middleware which runs no matter what endpoint (url path) the client requests, the latter (with a specific path) is run only if that specific path is hit. I.e. app.use('/hello', (req, res, next) => { res.send('world'); }); will return "world" when the endpoint "/hello" is hit, but not if the client requests "/hi". Where as app.use((req, res, next) => { res.send('world'); }); would return "world" when you hit any endpoint.
There are more complex things you can do with this, but that's the basics of attaching middleware to your application. The order they are attached to the application, is the order in which they will run.
One more thing, this will blow your mind, an express application made with the standard const app = express() can also be used as middleware. This means you can create several express applications, and then mount them using app.use to a single express application. This is pretty advanced, but does allow you to do some really great things with Express.
Why can you not access res.locals in socket.io? (The real question)
Within your middleware handler, you are setting up a res.locals.use_id property. This only lives with that individual request, you can pass it around as long as the request is alive by passing it into other functions, but outside of that request it doesn't exist. res is literally the response object that tells Express how to respond to the clients request, you can set properties of it during the request but once that HTTP request has ended it's gone.
Socket.io is a way of handling web socket requests, not standard HTTP requests. Thus, in a standard express HTTP request you will not be able to hand off the connection to anything with socket.io, because the connection is a single short lived HTTP request. Likewise, you won't be able to do the same the other way.
If you wish to find the users id in a socket.io request, you'll have to do this within the socket.io request itself.
Right now, you're entering a piece of middleware for an Express.js request, you are then calling next() which runs the next piece of express middleware, at no point does it cross over into Socket.io realms. This is often confused by tutorials because Socket.io can handle requests across the same port as Express is listening on, but the two are not crossed over. So you will need to write separate middleware for both Express.js requests chains, and socket.io request chains. There are ways of writing this code once and then writing an adapter to use it across both platforms, but that's not what you've tried to do here.
I would suggest you look at doing just nodejs and express for a time before taking on socket.io as well, otherwise you're trying to learn a whole heap of technologies all at once is quite a lot to try and take on board all at once.

How can I inspect the set of Express.js middleware that is being used?

Backstory: I'm trying to debug an issue in one piece of middleware that I think is coming from other piece. But, I'm not sure. So anyway, I would like to be able to check what middleware is actually being called, because I'm not sure of the ordering.
Is it possible to inspect the set of middleware that is currently being used?
I have tried to find any piece of internal state where Express might be storing the middleware, but I was not able to.
You can't see what is specifically "middleware", but you can inspect all registered handlers.
In express 4:
// Routes registered on app object
app._router.stack
// Routes registered on a router object
router.stack
Fine for inspection/debugging, probably not a great idea to program against any variable prefaced with an underscore.
How middleware works:
The middlewares are generally used to transform a request or response object, before it reaches to other middlewares.
If you are concerned with the order in which the middlewares are called, express calls the middleware, in the order in which they are defined.
Example,
app.use(compression());
app.use(passport.initialize());
app.use(bodyParser());
app.use(cookieParser());
the order is
compression,
passport,
bodyParser,
cookieParser
(plus I think your bodyParser and cookieParser middlewares should be before the other middlewares like passport).
That is the reason why the error handling middlewares are kept at last, so that if it reaches them, they give an error response.
So basically, request drips down the middlewares until one of them says that it does not want it to go any further(get, post methods are such middlewares, that stop the request).
Now, the debugging part:
You may not be able to inspect the middleware properly internally, but you can check whether middleware has worked properly by inserting your own custom middleware in between and then put a breakpoint on it.
Let's say you want is to debug what happened to your request after the bodyParser middlewares does it tasks, you can do is put your custom middleware in between and check the request and response whether they are modified properly or not.
how you do this is by following example.
Example,
app.use(bodyParser());
//custom middleware to inspect which middleware is not working properly
app.use(function(req,res,next){
console.log("check something!"); //do something here/put a breakpoint
next();
/*this function is the third parameter
and need to be called by the middleware
to proceed the request to the next middleware,
if your don't write this line, your reqest will just stop here.
app.get/app.post like methods does not call next*/
});
app.use(cookieParser());
This is one way in which you move this custom debugger in between the middlewares, until you figure out which one is giving faulty outputs.
Also, if you want to check the middlewares functionality, you can look at the documentation of those middlewares. They are quite good.
Check by playing with your custom middleware.

node.js body on http request object vs body on express request object

I'm trying to build an http module that suppose to work with an express server.
while reading the http module api, I see that it doesn't save the body inside the request object.
So my questions are:
If I want to build an express server which works with the official http module, how should I get the body?
I consider to implement the http module in the following way: listening to the socket, and if I get content-length header, listetning to the rest of the socket stream till I get all the body, save it as a memeber of the http request, and only then send the request object to the express server handler.
What are the pros and cons of my suggestion above vs letting the express server to "listen" to the body of the request via request.on('data',callback(data))
I mean , why shouldn't I keep the body inside the 'request' object the same way I keep the headers?
It's hard to answer your question without knowing exactly what you want to do. But I can give you some detail about how the request body is handled by Node/Express, and hopefully you can take things from there.
When handling a request (either directly via Node's request handler, or through Express's request handlers), the body of the request won't automatically be received: you have to open an HTTP stream to receive it.
The type of the body content should be determined by the Content-Type request header. The two most common body types are application/x-www-form-urlencoded and multipart/form-data. It's possible, however, to use any content type you want, which is usually more common for APIs (for example, using application/json is becoming more common for REST APIs).
application/x-www-form-urlencoded is pretty straightforward; name=value pairs are URL encoded (using JavaScript's built-in encodeURIComponent, for example), then combined with an ampersand (&). They're usually UTF-8 encoded, but that can also be specified in Content-Type.
multipart/form-data is more complicated, and can also typically be quite large, as vkurchatkin's answer points out (meaning you may not want to bring it into memory).
Express makes available some middleware to automatically handle the various types of body parsing. Usually, people simply use bodyParser, though you have to be careful with that middleware. It's really just a convenience middleware that combines json, urlencoded, and multipart. However, multipart has been deprecated. Express is still bundling Connect 2.12, which still includes multipart. When Express updates its dependency, though, the situation will change.
As I write this, bodyParser, json, urlencoded, and multipart have all been removed from Connect. Everything but multipart has been moved into the module body-parser (https://github.com/expressjs/body-parser). If you need multipart support, I recommend Busboy (https://npmjs.org/package/busboy), which is very robust. At some point, Express will update it's dependency on Connect, and will most likely add a dependency to body-parser since it has been removed from Connect.
So, since bodyParser bundles deprecated code (multipart), I recommend explicitly linking in only json and urlencoded (and you could even omit json if you're not accepting any JSON-encoded bodies):
app.use(express.json());
app.use(express.urlencoded());
If you're writing middleware, you probably don't want to automatically link in json and urlencoded (much less Busboy); that would break the modular nature of Express. However, you should specify in your documentation that your middleware requires the req.body object to be available (and fail gracefully if it isn't): you can go on to say that json, urlencoded, and Busboy all provide the req.body object, depending on what kind of content types you need to accept.
If you dig into the source code for urlencoded or json, you will find that they rely on another Node module, raw-body, which simply opens the request stream and retrieves the body content. If you really need to know the details of retrieving the body from a request, you will find everything you need in the source code for that module (https://github.com/stream-utils/raw-body/blob/master/index.js).
I know that's a lot of detail, but they're important details!
You can do that, it fairly simple. bodyParser middleware does that, for example (https://github.com/expressjs/body-parser/blob/master/index.js#L27). The thing is, request body can be really large (file upload, for example), so you generally don't want to put that in memory. Rather you can stream it to disk or s3 or whatnot.

Express app.use

I have been reading documents/urls and really not understand about app.use and its usage.
I understand that it is part of connect but I am really not getting that.
Example:
// ignore GET /favicon.ico
app.use(express.favicon());
// add req.session cookie support
app.use(express.cookieSession());
// do something with the session
app.use(count);
can you please explain me all these 3 . what they mean?
does this mean based on (1) that
app.use is noting but => app.get?
app.use(count) what and when is this count be executed (or) called/
Looks Basic but did not get the answers
// ignore GET /favicon.ico
app.use(express.favicon());
// pass a secret to cookieParser() for signed cookies
app.use(express.cookieParser('manny is cool'));
// add req.session cookie support
app.use(express.cookieSession());
// do something with the session
app.use(count);
// custom middleware
function count(req, res) {
When you call app.use(), you pass in a function to handle requests. As requests come in, Express goes through all of the functions in order until the request is handled.
express.favicon is a simple function that returns favicon.ico when it is requested. It's actually a great example for how to get started with this pattern. You can view the source code by looking at its source: node_modules/express/node_modules/connect/lib/middleware/favicon.js
express.cookieSession is some more middleware for supporting session data, keyed from the client by a cookie.
I don't know what count does... is that your own code? In any case, let me know if this is not clear.

Resources