Node How to use routes in parallel function - node.js

i am trying to make node render more faster.
then i want use parallel.
so how to put in routes in parallel function?
before
var app = express();
var index = require('./routes/index')();
var auth = require('./routes/auth')();
app.use('/',index);
app.use('/auth/',auth);
after ( I am trying this)
var app = express();
var index = require('./routes/index')();
var auth = require('./routes/auth')();
function parallel(middlewares){
return function (req, res, next){
async.each(middlewares,function(mw,cb){
mw(req,res,cb);
},next);
};
};
app.use(parallel([
['/',index],
['/auth/',auth],
[others here]
]));

I found a way to do this. It comes with a few caveats which are mostly due to the fact that Express is designed around sequential middleware, but by following a set of guidelines you can make it work just fine.
The Problem Statement
We want to pass in a group of middleware and have them run in parallel (or as parallel as their async operations will allow). If you have multiple independent async things to do in middleware, this should be able to get to an end result quicker (which is pretty much the whole point of doing this).
We want to be able to pass in typical routing paths (with all wildcards and special characters) as in app.use('/product/:id', fn) and then execute only the routes that match the current request "in parallel" with each other
We want Express itself to do all the route matching so we don't have to reimplement or copy any of that and so that everything Express normally supports for route matching is supported.
We want to support route parameters like req.params, even though those may be different for each middleware (not quite so common to use this in middleware, but still part of the Express design).
The Design Scheme
We create our own Router object. To that router object, we add a "start" marker middleware at the beginning (so we can see when routes are starting on this router), then we add a place holder middleware with the proper path for each of our parallel middleware handlers and then we add another "end" marker middleware at the end (so we can see when routes are done on this router). The "start" and "end" routes match all routes so they are always called. The other routes have the path that was passed in for them so they may or may not get called for any given request depending upon whether they match the current path or not.
This router gets added to the routing stack with app.use(router). In this way, the regular Express engine will do all the routing for this router and decide which routes match the current request path. But, rather than execute the regualar middleware functions when it finds a matching route path, it will just execute our placeholder middleware. When it executes the placeholder middleware, we will get to see the "start" middleware, any other middleware that matches the route which we will capture in a list and then the "end" middleware. When we get the "end" middleware, we will have captured the list of middlewares that match the current route and we can then go execute just those actual middlewares in parallel. When all those middlewares are done, we then call next() for our router allowing the rest of routing to continue.
So, in summary, we insert dummy route handlers with the actual route paths and let Express call our dummy route handlers if the path matches as a means of telling us which routes match the current path. Then, we take that list of matching routes and set them up for parallel execution. In this way, Express does all the work of telling us which routes match the current request path.
Implementation
So, to implement this, I define a new app.useParallel() and we add a fourth parameter to the middleware function for req.param that belongs to that specific middleware route definition.
// pass an array of path, fn pairs
// ['/', func1, '/somePath', func2]
app.useParallel = function(array) {
// create a router that will be inserted only for route matching
let router = express.Router();
// insert route at beginning to make start of routes getting called
router.use(function(req, res, next) {
req.routeList = [];
next();
});
// let the router have dummy route handlers with all the right paths
// so we can use it to see which paths it will match
for (let r of array) {
router.use(r[0], function(req, res, next) {
// for each route that actually gets called (and thus must have matched the path),
// save the corresponding callback function and a copy of the req.params
req.routeList.push({fn: r[1], params: Object.assign({}, req.params)});
next();
});
}
// now insert route at end of router that matches all routes to know when we're done
router.use(function(req, res, next) {
let routeList = req.routeList;
if (routeList && routeList.length) {
// now we are ready here to execute the route handlers in req.routeList in parallel
let len = routeList.length;
let doneCnt = 0;
let nextCalled = false;
for (let middleware of routeList) {
middleware.fn(req, res, function(err) {
++doneCnt;
if (err) {
// make sure we only call next() once
if (!nextCalled) {
nextCalled = true;
next(err);
}
} else {
if (doneCnt === len && !nextCalled) {
next();
}
}
}, middleware.params);
}
} else {
next();
}
});
// insert this router in the chain
app.use(router);
}
And, then this is used like this:
function test1(req, res, next, params) {
// some async operation that calls next() when done
next();
}
// similar definitions for test2(), test3() and test4()
app.parallel([['/', test1], ['/', test2], ['/:id', test3], ['/test', test4]]);
Restrictions
Running multiple middlewares potentially in parallel leads to some restrictions on the middleware - all of which seem somewhat expected if you're setting up for parallel operation. Here are some of the restrictions:
You will get interleaved execution of these handlers if any handler uses asynchronous calls and then completes sometime later. Since node.js is still single threaded, this will NOT do parallel execution of purely synchronous middleware handlers. If they are synchronous, they will still be executed synchronously.
The initial synchronous part of each parallel middleware handler (before it returns while waiting for async responses) is still called in proper sequence.
If any middleware calls next(err), the first one to do it will be the only one that gets processed - others will be ignored.
The req object is shared among all the parallel middleware functions. As such, you have to be aware of any race conditions in using it if you have async operations in your middleware writing to the req object. It can certainly be used as a place to store independent properties (different for each middleware), but two parallel middlewares cannot be expecting sequential access to the same property (one sets it and the other reads what was set) because the execution order is unpredictable. So, you are safest if each parallel middleware only reads standard properties and only writes its own properties.
Because the req object is shared, each middleware can't have its own value for req.param like normal middleware would. As such, do not use req.param at all. Instead, each parallel middleware is passed a fourth argument that is the param object. This allows each parallel middleware to have its own param object.
If any middleware actually sends a response (as opposed to just setting up req variables for later route handlers), then you need to know that it's racy. In general, I would not think you'd use parallel middleware to actually send a response, but I could imagine a few rare cases where you just want the first middleware that finds an answer to send the response. If more than one attempts to send a response, you will get a warning about multiple responses (Express will catch it for you). It is not blocked here.
It should go without saying that any async code in these parallel handlers finishes in an arbitrary order. Do not execute handlers in parallel that require any specific ordering relative to each other.
Use of the req.route property in the parallel middleware is not supported.
Minimal Testing So far
I have not exhaustively tested this, but I do have it running in a sample app that just uses random timers to call next() in each of four parallel middlewares. The route matching works. The params feature works. The middlewares do appear to run in parallel and complete in random order (per their random timers).
All parallel route handlers finish before subsequent routing continues.
The only state used during the parallel processing is stored either on the req object (which should be unique to each request) or in closures so it should be safe from race conditions of multiple parallel requests to the server that are in flight at the same time (though I haven't pounded on a server with lots of parallel requests to confirm that).

Related

Express router order of request executions: `/state/:params` vs `/state/absolute-path`

If I have two REST endpoints:
app.get('/something/:id', ...handlers);
app.get('/something/else', ...handlers);
And I send a request to http://host:port/something/else
Is there a way to make Express router execute the endpoint with absolute path first (/something/else) before executing the one that matches the query params (/something/:id)?
I understand that I can reverse the order of invocation and specify the endpoint with query params last. But logically speaking, absolute path should take priority over query params and I believe that's the default behaviour for Koa.js
Just put the more specific route first and the wildcard route second. Routes are matched in order and the first one that matches handles the request and the others are not then processed. So, put the more specific route for /something/else before the /something/:id and you will see the /something/else route work properly when that's the URL.
// put wildcard route last and more specific route definitions first
// routes are matched in the order they are defined
app.get('/something/else', ...handlers);
app.get('/something/:id', ...handlers);
This does raise the question why you have designed this potential conflict into your URL scheme in the first place. You've essentially overloaded the id namespace and have reserved at least one id value for your own use. This can be managed by careful ordering of the route definitions, but it would generally be better if you didn't have this conflict in your URL design in the first place.
Is there a way to make Express router execute the endpoint with absolute path first ('/something/else') before executing the one that matches the query params ('/something/:id')?
Yes, define the more specific route first.
I understand that I can reverse the order of invocation and specify the endpoint with query params last. But logically speaking, absolute path should take priority over query params and I believe that's the default behaviour for Koa.js
You asked about Express. It matches routes in the order you've defined them. It doesn't try to guess which route it "thinks" you want to match first. It lets you define that exactly via the order of your route definitions.
I don't know Koa.js well, but there is this in the doc for Koa2: Middleware is now always run in the order declared by .use() (or .get(), etc.), which matches Express 4 API.
There are no specific route matching rules for express.js to match the routes.It goes and try to match every registered route with incoming request path and calls route handlers for all matched paths. Thus the following code will work.
app.get('/something/:id', (req, res, next) => {
console.log(`Calling with param ${req.params.id}`);
next(); // if you remove next from here it will not call the rest of the handlers
});
app.get('/something/else', (req, res, next) => {
console.log(`Calling with else`);
next();
});
Output:
Thus the only way to make sure the routes match exactly, define routes in their specific order.
app.get('/something/else', ...handlers);
app.get('/something/:id', ...handlers);

asynchronic calls in node.js

I have a basic misunderstanding regarding asynchronic calls using express and middlewares and i would really appreciate some help understanding it.
Suppose we have this code:
var express = require('express')
var cookieParser = require('cookie-parser')
var app = require('express')
var router = express.Router()
app.use(function timeLog (req, res, next) {
req.requestTime = Date.now()
next()
})
app .use(express.json());
app .use(express.urlencoded());
app.use(cookieParser())
router.post('/hello', function (req, res) {
//write async to file
res.send('bye')
})
Now, when the client calls this endpoint "hello":
are the middlewares which defined in App-level called asynchronicaly? I understood that they do(because they are called "Callbacks" in the documentation... so basically before reaching the router: parsing cookies, parsing json to req.body and adding req.requestTime will run asynchronicaly and then will be routed to '/hello' end point.
after routing, is the callback will run asynchronicaly? if yes, then how in this case the request is not left hanging? i see that the response is being terminated inside a body of a callback... how this make any sense? :(
would somebody please explain to me this flow?
I will try to explain how I understood "async calls" through this code above: lets say alot of users trying to get this end point. all these calls added to the call stack, then because of these callbacks are async, then they are moved to event Queue/table and will be handled after the call stack will be "empty". if this is the case, how the first user will ever get a response? the requestTime done async, parsing the json done async and when reaching the router, the callback is done async.... so when the first user will ever get a repsone if all these async calls located inside the event Queue/table and they will be handled only after the callstack is empty? what am i missing here?
Thanks.
The middleware doesn't appear to be asynchronous of its own. In other words, as you said in your comment on another answer, it is not forcing each layer in the expressjs "stack" of middleware/handlers into a separate frame in the JavaScript event queue.
If you trace the next() function in a .use(), there are a couple of setImmediates fairly early on to handle "exit router" or "no more layers," but then you get into a while loop on a stack of handlers. This is happening at around this point in the code.
So if all your middleware was similar to this section, all middleware etc would happen in the same frame within the event queue:
app.use(function(req, res, next){
console.log('synchronous layer');
next();
});
Whereas a step like this next one would put the next into a separate frame in the event queue, and potentially allow the process to handle other frames that may be queued up.
app.use(function(req, res, next){
setImmediate(()=> {
console.log('setImmediate puts this next() into a separate frame in the event queue');
next();
});
});
I can't imagine this would normally be a problem. Most things that would happen in middleware that might take some time (a database call etc) are very likely all going to be happening asynchronously (in a way that puts the next into a new frame in the event queue). But it is something worth considering when you're adding middleware...
All those middleware are using the continuation passing style. So basically they COULD run asynchonous. But they don't have to. It depends on, whether those middlewares are doing some IO. You could take a look into the code to check how the functions behave exactly, but at least, this does not matter. Just keep in mind, that they COULD run asynchronous.

Koa-router getting parsed params before hitting route

I'm using koa2 and koa-router together with sequelize on top. I want to be able to control user access based on their roles in the database, and it's been working somewhat so far. I made my own RBAC implementation, but I'm having some trouble.
I need to quit execution BEFORE any endpoint is hit if the user doesn't have access, considering endpoints can do any action (like inserting a new item etc.). This makes perfect sense, I realize I could potentially use transactions with Sequelize, but I find that would add more overhead and deadline is closing in.
My implementation so far looks somewhat like the following:
// initialize.js
initalizeRoutes()
initializeServerMiddleware()
Server middleware is registered after routes.
// function initializeRoutes
app.router = require('koa-router')
app.router.use('*', access_control(app))
require('./routes_init')
routes_init just runs a function which recursively parses a folder and imports all middleware definitions.
// function initializeServerMiddleware
// blah blah bunch of middleware
app.server.use(app.router.routes()).use(app.router.allowedMethods())
This is just regular koa-router.
However, the issue arises in access_control.
I have one file (access_control_definitions.js) where I specify named routes, their respective sequelize model name, and what rules exists for the route. (e.g. what role, if the owner is able to access their own resource...) I calculate whether the requester owns a resource by a route param (e.g. resource ID is ctx.params.id). However, in this implementation, params don't seem to be parsed. I don't think it's right that I have to manually parse the params before koa-router does it. Is anyone able to identify a better way based on this that would solve ctx.params not being filled with the actual named parameter?
edit: I also created a GitHub issue for this, considering it seems to me like there's some funny business going on.
So if you look at router.js
layerChain = matchedLayers.reduce(function(memo, layer) {
memo.push(function(ctx, next) {
ctx.captures = layer.captures(path, ctx.captures);
ctx.params = layer.params(path, ctx.captures, ctx.params);
ctx.routerName = layer.name;
return next();
});
return memo.concat(layer.stack);
}, []);
return compose(layerChain)(ctx, next);
What it does is that for every route function that you have, it add its own capturing layer to generate the params
Now this actually does make sense because you can have two middleware for same url with different parameters
router.use('/abc/:did', (ctx, next) => {
// ctx.router available
console.log('my request came here too', ctx.params.did)
if (next)
next();
});
router.get('/abc/:id', (ctx, next) => {
console.log('my request came here', ctx.params.id)
});
Now for the first handler a parameter id makes no sense and for the second one parameter did doesn't make any sense. Which means these parameters are specific to a handler and only make sense inside the handler. That is why it makes sense to not have the params that you expect to be there. I don't think it is a bug
And since you already found the workaround
const fromRouteId = pathToRegexp(ctx._matchedRoute).exec(ctx.captures[0])
You should use the same. Or a better one might be
var lastMatch = ctx.matched[ctx.matched.length-1];
params = lastMatch.params(ctx.originalUrl, lastMatch.captures(ctx.originalUrl), {})

difference between app.all('*') VS app.use(function)?

app.all('*', function(req, res, next) {
vs
app.use(function (req, res, next) {
Whats the difference? doesn't both take in each request to the server?
For the wildcard * path, there's really not much of a meaningful difference at all. It appears to me like the internal implementation may be slightly more efficient for app.use(fn), then app.all('*', fn). And, if you intend for it to run on all routes, then app.use() makes more logical sense to me since what you're really doing is middleware and that's what app.use() is specially designed for.
Summary for app.all('*', fn) vs. app.use(fn):
No difference in order of execution.
app.use() fires regardless of methods, app.all() only fires for parser supported methods (probably not relevant since the node.js http parser supports all expected methods).
Summary for app.all('/test', fn) vs. app.use('/test', fn):
No difference in order of execution
app.use() fires regardless of methods, app.all() only fires for parser supported methods (probably not relevant since the node.js http parser supports all expected methods).
app.use() fires for all paths that start with /test include /test/1/ or /test/otherpath/more/1. app.all() only fires if its an exact match to the requested url.x
Details
All route handlers or middleware that match a given route are executed in the order they were defined so app.all('*', fn) and app.use(fn) do not have any different ordering when placed in identical places in the code.
In looking at the Express code for app.all() it appears that the way it works is that it just goes through all the HTTP methods that the locally installed HTTP parser supports and registers a handler for them. So, for example, if you did:
app.all('*', fn);
The Express code would run these:
app.get('*', fn);
app.put('*', fn);
app.post('*', fn);
app.delete('*', fn);
// ...etc...
Whereas app.use() is method independent. There would be only one handler in the app router's stack that is called no matter what the method is. So, even if an unsupported http verb was issued and the parser let the request get this far, the app.use() handler would still apply whereas the app.all() handler would not.
If you use a path with both app.all() and app.use() that is not just a simple wildcard like '*', then there is a meaningful difference between the two.
app.all(path, fn) only triggers when the requested path matches the path here in its entirety.
app.use(path, fn) trigger when the start of the requested path matches the path here.
So, if you have:
app.all('/test', fn1); // route 1
app.use('/test', fn2); // route 2
And, you issue a request to:
http://yourhost.com/test // both route1 and route2 will match
http://yourhost.com/test/1 // only route2 will match
Because only middleware with app.use() fires for partial matches where the requested URL is contains more path segments beyond what is specified here.
So, if you intend to insert some middleware that runs for all routes or runs for all routes that are descended from some path, then use app.use(). Personally, I would only use app.all(path, fn) if I wanted a handler to be run only for a specific path no matter what the method was and I didn't not want it to also run for paths that contain this path at the start. I see no practical reason to ever use app.all('*', fn) over app.use(fn).

Express Framework app.post and app.get

I am fairly new to the express framework. I couldn't find the documentation for application.post() method in the express API reference. Can someone provide a few examples of all the possible parameters I can put in the function? I've read a couple sites with the following example, what does the first parameter mean?
I know the second parameter is the callback function, but what exactly do we put in the first parameter?
app.post('/', function(req, res){
Also, let's say we want the users to post(send data to our server) ID numbers with a certain format([{id:134123, url:www.qwer.com},{id:131211,url:www.asdf.com}]). We then want to extract the ID's and retrieves the data with those ID's from somewhere in our server. How would we write the app.post method that allows us to manipulate the input of an array of objects, so that we only use those object's ID(key) to retrieve the necessary info regardless of other keys in the objects. Given the description of the task, do we have to use app.get() method? If so, how would we write the app.get() function?
Thanks a lot for your inputs.
1. app.get('/', function(req, res){
This is telling express to listen for requests to / and run the function when it sees one.
The first argument is a pattern to match. Sometimes a literal URL fragment like '/' or '/privacy', you can also do substitutions as shown below. You can also match regexes if necessary as described here.
All the internal parts of Express follow the function(req, res, next) pattern. An incoming request starts at the top of the middleware chain (e.g. bodyParser) and gets passed along until something sends a response, or express gets to the end of the chain and 404's.
You usually put your app.router at the bottom of the chain. Once Express gets there it starts matching the request against all the app.get('path'..., app.post('path'... etc, in the order which they were set up.
Variable substitution:
// this would match:
// /questions/18087696/express-framework-app-post-and-app-get
app.get('/questions/:id/:slug', function(req, res, next){
db.fetch(req.params.id, function(err, question){
console.log('Fetched question: '+req.params.slug');
res.locals.question = question;
res.render('question-view');
});
});
next():
If you defined your handling functions as function(req, res, next){} you can call next() to yield, passing the request back into the middleware chain. You might do this for e.g. a catchall route:
app.all('*', function(req, res, next){
if(req.secure !== true) {
res.redirect('https://'+req.host+req.originalUrl);
} else {
next();
};
});
Again, order matters, you'll have to put this above the other routing functions if you want it to run before those.
I haven't POSTed json before but #PeterLyon's solution looks fine to me for that.
TJ annoyingly documents this as app.VERB(path, [callback...], callback in the express docs, so search the express docs for that. I'm not going to copy/paste them here. It's his unfriendly way of saying that app.get, app.post, app.put, etc all have the same function signature, and there are one of these methods for each supported method from HTTP.
To get your posted JSON data, use the bodyParser middleware:
app.post('/yourPath', express.bodyParser(), function (req, res) {
//req.body is your array of objects now:
// [{id:134123, url:'www.qwer.com'},{id:131211,url:'www.asdf.com'}]
});

Resources