I have a basic misunderstanding regarding asynchronic calls using express and middlewares and i would really appreciate some help understanding it.
Suppose we have this code:
var express = require('express')
var cookieParser = require('cookie-parser')
var app = require('express')
var router = express.Router()
app.use(function timeLog (req, res, next) {
req.requestTime = Date.now()
next()
})
app .use(express.json());
app .use(express.urlencoded());
app.use(cookieParser())
router.post('/hello', function (req, res) {
//write async to file
res.send('bye')
})
Now, when the client calls this endpoint "hello":
are the middlewares which defined in App-level called asynchronicaly? I understood that they do(because they are called "Callbacks" in the documentation... so basically before reaching the router: parsing cookies, parsing json to req.body and adding req.requestTime will run asynchronicaly and then will be routed to '/hello' end point.
after routing, is the callback will run asynchronicaly? if yes, then how in this case the request is not left hanging? i see that the response is being terminated inside a body of a callback... how this make any sense? :(
would somebody please explain to me this flow?
I will try to explain how I understood "async calls" through this code above: lets say alot of users trying to get this end point. all these calls added to the call stack, then because of these callbacks are async, then they are moved to event Queue/table and will be handled after the call stack will be "empty". if this is the case, how the first user will ever get a response? the requestTime done async, parsing the json done async and when reaching the router, the callback is done async.... so when the first user will ever get a repsone if all these async calls located inside the event Queue/table and they will be handled only after the callstack is empty? what am i missing here?
Thanks.
The middleware doesn't appear to be asynchronous of its own. In other words, as you said in your comment on another answer, it is not forcing each layer in the expressjs "stack" of middleware/handlers into a separate frame in the JavaScript event queue.
If you trace the next() function in a .use(), there are a couple of setImmediates fairly early on to handle "exit router" or "no more layers," but then you get into a while loop on a stack of handlers. This is happening at around this point in the code.
So if all your middleware was similar to this section, all middleware etc would happen in the same frame within the event queue:
app.use(function(req, res, next){
console.log('synchronous layer');
next();
});
Whereas a step like this next one would put the next into a separate frame in the event queue, and potentially allow the process to handle other frames that may be queued up.
app.use(function(req, res, next){
setImmediate(()=> {
console.log('setImmediate puts this next() into a separate frame in the event queue');
next();
});
});
I can't imagine this would normally be a problem. Most things that would happen in middleware that might take some time (a database call etc) are very likely all going to be happening asynchronously (in a way that puts the next into a new frame in the event queue). But it is something worth considering when you're adding middleware...
All those middleware are using the continuation passing style. So basically they COULD run asynchonous. But they don't have to. It depends on, whether those middlewares are doing some IO. You could take a look into the code to check how the functions behave exactly, but at least, this does not matter. Just keep in mind, that they COULD run asynchronous.
Related
I'm reading the GitHub https://github.com/goldbergyoni/nodebestpractices and trying to apply the tips on my project. Currently i'm working on the "1.2 Layer your components, keep Express within its boundaries" tip, but I have a question.
I'm using routes/controllers, and using this tip (1.2), a route with multiple middlewares will look like this.
router.post("/do-multiple-stuff",
(req, res, next) => {
stuffController.getStuffDone(req.body.stuff);
next();
},
(req, res, next) => {
stuffController.getOtherStuffDone(req.body.otherStuff);
return res.send("stuff done");
});
Is this correct? Or there's a better way to do this?
Thanks! <3
The point of that 1.2 section is to create your business logic as a separate, testable component that is passed data only, not passed req and res. This allows it to be independently and separately tested without the Express environment around it.
Your calls to:
stuffController.getStuffDone(req.body.stuff);
and
stuffController.getOtherStuffDone(req.body.otherStuff);
Are indeed making that proper separation between the web and the business logic because you aren't passing req or res to your controller. That looks like it meets the point of the 1.2 training step.
The one thing I see missing here is that there isn't any output from either of these function calls. They don't return anything and since you don't pass req or res to them, they can't be modifying the req object (like some middleware does) and can't be sending a response or error by themselves. So, it appears that these need a mechanism for communicating some type of result back, either a direct return value (if the functions are synchronous) or returning a promise (if the functions are asynchronous). Then, the calling code could get their result and do something with that result.
i am trying to make node render more faster.
then i want use parallel.
so how to put in routes in parallel function?
before
var app = express();
var index = require('./routes/index')();
var auth = require('./routes/auth')();
app.use('/',index);
app.use('/auth/',auth);
after ( I am trying this)
var app = express();
var index = require('./routes/index')();
var auth = require('./routes/auth')();
function parallel(middlewares){
return function (req, res, next){
async.each(middlewares,function(mw,cb){
mw(req,res,cb);
},next);
};
};
app.use(parallel([
['/',index],
['/auth/',auth],
[others here]
]));
I found a way to do this. It comes with a few caveats which are mostly due to the fact that Express is designed around sequential middleware, but by following a set of guidelines you can make it work just fine.
The Problem Statement
We want to pass in a group of middleware and have them run in parallel (or as parallel as their async operations will allow). If you have multiple independent async things to do in middleware, this should be able to get to an end result quicker (which is pretty much the whole point of doing this).
We want to be able to pass in typical routing paths (with all wildcards and special characters) as in app.use('/product/:id', fn) and then execute only the routes that match the current request "in parallel" with each other
We want Express itself to do all the route matching so we don't have to reimplement or copy any of that and so that everything Express normally supports for route matching is supported.
We want to support route parameters like req.params, even though those may be different for each middleware (not quite so common to use this in middleware, but still part of the Express design).
The Design Scheme
We create our own Router object. To that router object, we add a "start" marker middleware at the beginning (so we can see when routes are starting on this router), then we add a place holder middleware with the proper path for each of our parallel middleware handlers and then we add another "end" marker middleware at the end (so we can see when routes are done on this router). The "start" and "end" routes match all routes so they are always called. The other routes have the path that was passed in for them so they may or may not get called for any given request depending upon whether they match the current path or not.
This router gets added to the routing stack with app.use(router). In this way, the regular Express engine will do all the routing for this router and decide which routes match the current request path. But, rather than execute the regualar middleware functions when it finds a matching route path, it will just execute our placeholder middleware. When it executes the placeholder middleware, we will get to see the "start" middleware, any other middleware that matches the route which we will capture in a list and then the "end" middleware. When we get the "end" middleware, we will have captured the list of middlewares that match the current route and we can then go execute just those actual middlewares in parallel. When all those middlewares are done, we then call next() for our router allowing the rest of routing to continue.
So, in summary, we insert dummy route handlers with the actual route paths and let Express call our dummy route handlers if the path matches as a means of telling us which routes match the current path. Then, we take that list of matching routes and set them up for parallel execution. In this way, Express does all the work of telling us which routes match the current request path.
Implementation
So, to implement this, I define a new app.useParallel() and we add a fourth parameter to the middleware function for req.param that belongs to that specific middleware route definition.
// pass an array of path, fn pairs
// ['/', func1, '/somePath', func2]
app.useParallel = function(array) {
// create a router that will be inserted only for route matching
let router = express.Router();
// insert route at beginning to make start of routes getting called
router.use(function(req, res, next) {
req.routeList = [];
next();
});
// let the router have dummy route handlers with all the right paths
// so we can use it to see which paths it will match
for (let r of array) {
router.use(r[0], function(req, res, next) {
// for each route that actually gets called (and thus must have matched the path),
// save the corresponding callback function and a copy of the req.params
req.routeList.push({fn: r[1], params: Object.assign({}, req.params)});
next();
});
}
// now insert route at end of router that matches all routes to know when we're done
router.use(function(req, res, next) {
let routeList = req.routeList;
if (routeList && routeList.length) {
// now we are ready here to execute the route handlers in req.routeList in parallel
let len = routeList.length;
let doneCnt = 0;
let nextCalled = false;
for (let middleware of routeList) {
middleware.fn(req, res, function(err) {
++doneCnt;
if (err) {
// make sure we only call next() once
if (!nextCalled) {
nextCalled = true;
next(err);
}
} else {
if (doneCnt === len && !nextCalled) {
next();
}
}
}, middleware.params);
}
} else {
next();
}
});
// insert this router in the chain
app.use(router);
}
And, then this is used like this:
function test1(req, res, next, params) {
// some async operation that calls next() when done
next();
}
// similar definitions for test2(), test3() and test4()
app.parallel([['/', test1], ['/', test2], ['/:id', test3], ['/test', test4]]);
Restrictions
Running multiple middlewares potentially in parallel leads to some restrictions on the middleware - all of which seem somewhat expected if you're setting up for parallel operation. Here are some of the restrictions:
You will get interleaved execution of these handlers if any handler uses asynchronous calls and then completes sometime later. Since node.js is still single threaded, this will NOT do parallel execution of purely synchronous middleware handlers. If they are synchronous, they will still be executed synchronously.
The initial synchronous part of each parallel middleware handler (before it returns while waiting for async responses) is still called in proper sequence.
If any middleware calls next(err), the first one to do it will be the only one that gets processed - others will be ignored.
The req object is shared among all the parallel middleware functions. As such, you have to be aware of any race conditions in using it if you have async operations in your middleware writing to the req object. It can certainly be used as a place to store independent properties (different for each middleware), but two parallel middlewares cannot be expecting sequential access to the same property (one sets it and the other reads what was set) because the execution order is unpredictable. So, you are safest if each parallel middleware only reads standard properties and only writes its own properties.
Because the req object is shared, each middleware can't have its own value for req.param like normal middleware would. As such, do not use req.param at all. Instead, each parallel middleware is passed a fourth argument that is the param object. This allows each parallel middleware to have its own param object.
If any middleware actually sends a response (as opposed to just setting up req variables for later route handlers), then you need to know that it's racy. In general, I would not think you'd use parallel middleware to actually send a response, but I could imagine a few rare cases where you just want the first middleware that finds an answer to send the response. If more than one attempts to send a response, you will get a warning about multiple responses (Express will catch it for you). It is not blocked here.
It should go without saying that any async code in these parallel handlers finishes in an arbitrary order. Do not execute handlers in parallel that require any specific ordering relative to each other.
Use of the req.route property in the parallel middleware is not supported.
Minimal Testing So far
I have not exhaustively tested this, but I do have it running in a sample app that just uses random timers to call next() in each of four parallel middlewares. The route matching works. The params feature works. The middlewares do appear to run in parallel and complete in random order (per their random timers).
All parallel route handlers finish before subsequent routing continues.
The only state used during the parallel processing is stored either on the req object (which should be unique to each request) or in closures so it should be safe from race conditions of multiple parallel requests to the server that are in flight at the same time (though I haven't pounded on a server with lots of parallel requests to confirm that).
I am familiar with Express but new to Restify. Restify's document has many examples calling next() after res.send() as below:
server.get('/echo/:name', function (req, res, next) {
res.send(req.params);
return next();
});
This looks like a recommended pattern by some Restify experts as well:
The consequences of not calling next() in restify
What's the real use case of doing this? After you call res.send(), is there anything a handler next in the chain can do?
After I did some more research, I think I found an answer.
Only practical use case of calling next() after res.send() is when you want to install 'post-render' type middlewares to catch all transactions after route handers finish their job.
In other cases, it simply adds unnecessary overhead for every and each request because it scans the rest of routes to find a next match.
Most middlewares are 'pre-render' type, which don't need next() call anyway. Even for post-render type middlewares, depending on voluntary calls of next() is simply too risky. You'd rather want to use more error-proof method like https://github.com/jshttp/on-finished instead.
Calling next() after res.send() is a bad practice causing unnecessary overhead in most cases. It should be used only when it is absolutely needed and you know what you are doing.
In express and connect, is it bad to use "next" in middleware if I do not need it? Are there any possible negative outcomes? Assume there is no middleware which will be called after this middleware, and therefore the next will not call anything. I know it is bad for modularity, as if you want to add a callback for another middleware it may be accidentally triggered by the next in this middleware. However, in this case next is bad for modularity anyway, as middleware often interact in unexpected ways.
As an example of an unneeded next, consider the sample MEAN.JS stack, constructed by the guys who originally came up with the stack's name. It seems to have some next callbacks which do not ever get called. Many are in the users controller, including the signin function:
exports.signin = function(req, res, next) {
passport.authenticate('local', function(err, user, info) {
if (err || !user) {
res.status(400).send(info);
} else {
// Remove sensitive data before login
user.password = undefined;
user.salt = undefined;
req.login(user, function(err) {
if (err) {
res.status(400).send(err);
} else {
res.json(user);
}
});
}
})(req, res, next);
};
This function has a next callback defined. This next callback is then used by the passport.authenticate() custom middleware function as a parameter. However, this parameter is never used in the function itself. I have tried taking out the next definition from the function definition, as well as the custom passport middleware, and the route seems to still work. However, perhaps passport uses it in its authenticate() function, and leaving it out did not cause any trouble here but it may cause trouble in some cases.
I was recently looking at passport's tutorials on http://passportjs.org, and I came across a function in the section on custom callbacks on the authenticate page that looks almost exactly like the signin function in MEAN.JS. One difference was that it actually had some next callbacks (for error handling), so the next parameter was actually useful. Is it possible that the MEAN.JS app took a lot of code from passportjs.org's guide and changed it over time, but left in some vestigial remnants that do not do anything but were causing no harm? Or does the next parameter actually do something in passport.authenticate() that is not immediately obvious? Regardless of why this happened, does an extra next parameter in connect middleware cause any bad side effects if it is not used?
When writing middleware, the next parameter is optional. It's purpose is so that the next middleware in the chain will be called. If you want the current middleware to be the last one called for a given request, not executing the next parameter will accomplish that. This is fine for code that you write for yourself, but it's typically better to always execute the next parameter in middleware that may be used elsewhere because you don't know what else they could be adding.
For example, maybe you wanted to add some kind of logging that happens after a request is completed. If your middleware that runs before the logging middleware doesn't execute next, it won't be logged.
http://expressjs.com/api.html#middleware
Not executing next will simply not start the next middleware. There are no other side effects of not executing it other than those caused by not moving to the next middleware (for example, if the response hasn't ended yet, not calling next will result in a timeout.)
I am fairly new to the express framework. I couldn't find the documentation for application.post() method in the express API reference. Can someone provide a few examples of all the possible parameters I can put in the function? I've read a couple sites with the following example, what does the first parameter mean?
I know the second parameter is the callback function, but what exactly do we put in the first parameter?
app.post('/', function(req, res){
Also, let's say we want the users to post(send data to our server) ID numbers with a certain format([{id:134123, url:www.qwer.com},{id:131211,url:www.asdf.com}]). We then want to extract the ID's and retrieves the data with those ID's from somewhere in our server. How would we write the app.post method that allows us to manipulate the input of an array of objects, so that we only use those object's ID(key) to retrieve the necessary info regardless of other keys in the objects. Given the description of the task, do we have to use app.get() method? If so, how would we write the app.get() function?
Thanks a lot for your inputs.
1. app.get('/', function(req, res){
This is telling express to listen for requests to / and run the function when it sees one.
The first argument is a pattern to match. Sometimes a literal URL fragment like '/' or '/privacy', you can also do substitutions as shown below. You can also match regexes if necessary as described here.
All the internal parts of Express follow the function(req, res, next) pattern. An incoming request starts at the top of the middleware chain (e.g. bodyParser) and gets passed along until something sends a response, or express gets to the end of the chain and 404's.
You usually put your app.router at the bottom of the chain. Once Express gets there it starts matching the request against all the app.get('path'..., app.post('path'... etc, in the order which they were set up.
Variable substitution:
// this would match:
// /questions/18087696/express-framework-app-post-and-app-get
app.get('/questions/:id/:slug', function(req, res, next){
db.fetch(req.params.id, function(err, question){
console.log('Fetched question: '+req.params.slug');
res.locals.question = question;
res.render('question-view');
});
});
next():
If you defined your handling functions as function(req, res, next){} you can call next() to yield, passing the request back into the middleware chain. You might do this for e.g. a catchall route:
app.all('*', function(req, res, next){
if(req.secure !== true) {
res.redirect('https://'+req.host+req.originalUrl);
} else {
next();
};
});
Again, order matters, you'll have to put this above the other routing functions if you want it to run before those.
I haven't POSTed json before but #PeterLyon's solution looks fine to me for that.
TJ annoyingly documents this as app.VERB(path, [callback...], callback in the express docs, so search the express docs for that. I'm not going to copy/paste them here. It's his unfriendly way of saying that app.get, app.post, app.put, etc all have the same function signature, and there are one of these methods for each supported method from HTTP.
To get your posted JSON data, use the bodyParser middleware:
app.post('/yourPath', express.bodyParser(), function (req, res) {
//req.body is your array of objects now:
// [{id:134123, url:'www.qwer.com'},{id:131211,url:'www.asdf.com'}]
});