node express routing decision based on user-agent - node.js

Trying to figure out a way of supplying better data to social media (open graph data). Basically, when facebook, twitter or pinetrest asks for information about a link on my page, I want to provide them og information dependent on link instead of sending them the empty page (OK, it sends javascripts that they dont run).
I tried using prerender and similar, but cant get that to run propperly. But I also realised that I would rather get the express router to identify it and service a static page based on the request.
As a first step, I need to get the user agent information:
So I thought I would add express-useragent, and that seems to work on my test site, but does not seem like facebooks scraper ever goes past it. I can see it tries to get a picture, but never updates the OG or the index. (code below should work as an example)
var express = require('express');
var router = express.Router();
var useragent = require('express-useragent');
//Set up log
var cfgBunyan = require('../config/bunyan')
var log = cfgBunyan.dbLogger('ROUTE')
router.use(useragent.express());
/* GET home page. */
router.get('/', function(req, res, next) {
console.log(req.useragent);
res.render('index');
});
router.get('/share/:service', function(req, res, next) {
res.render('index');
});
router.get('/pages/:name', function (req,res, next){
log.info('/pages/'+req.params.name)
res.render('pages/'+req.params.name);
});
router.get('/modals/:name', function (req,res, next){
res.render('modals/'+req.params.name);
});
router.get('/page/:name', function (req,res, next){
res.render('index');
});
module.exports = router;
I can also tun the google test scraper, which gives me the following source
source: 'Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)' }
So has anyone figured out a easy way to direct facebook and twitter to another route? Or is sitting and checking the different sources the right way?

OK, so I managed to figure out a potential solution.
Basically, I created a function called isBot, which I call similar to how Authentication works, it will send the request to isBot, and check if.
1. ?_escaped_fragment_= is pressent in the url (Google and some others use that)
2. if the user agent is a known bot (Thanks prerender.io, borrowed your list from .htaccess for your service)
The setup is simple enough.
Add (You don't have to, Rob was right) express-useragent to your router (just to be able to get info from the header)
//var useragent = require('express-useragent'); //Not needed ror used
//router.use(useragent.express()); // Thought this was required, it is not
Then in any route you want to check for bots add isBot:
router.get('/', isBot ,function(req, res, next) {
Then add the below function (it does a lot of logging using bunyan, as I want to have statistics, you can remove any line that starts log.info, it should still work, or add bunyan, or just change the lines to console.log. Its just output.
If the code decides the code isn't a bot, it just renders as normal
function isBot (req, res, next){
var isBotTest = false;
var botReq = "";
var botID= ""; //Just so we know why we think it is a bot
var knownBots = ["baiduspider", "facebookexternalhit", "twitterbot", "rogerbot", "linkedinbot","embedly|quora\ link\ preview","howyoubot","outbrain","pinterest","slackbot","vkShare","W3C_Validator"];
log.info({http_user_agent: req.get('User-Agent')});
//log.info({user_source: req.useragent.source}); //For debug, whats the HTTP_USER_AGENT, think this is the same
log.info({request_url: req.url}); //For debug, we want to know if there are any options
/* Lets start with ?_escaped_fragment_=, this seems to be a standard, if we have this is part of the request,
it should be either a search engine or a social media site askign for open graph rich sharing info
*/
var urlRequest=req.url
var pos= urlRequest.search("\\?_escaped_fragment_=")
if (pos != -1) {
botID="ESCAPED_FRAGMENT_REQ";
isBotTest = true; //It says its a bot, so we believe it, lest figure out if it has a request before or after
var reqBits = urlRequest.split("?_escaped_fragment_=")
console.log(reqBits[1].length)
if(reqBits[1].length == 0){ //If 0 length, any request is infront
botReq = reqBits[0];
} else {
botReq = reqBits[1];
}
} else { //OK, so it did not tell us it was a bot request, but maybe it is anyway
var userAgent = req.get('User-Agent');
for (var i in knownBots){
if (userAgent.search(knownBots[i]) != -1){
isBotTest = true;
botReq=urlRequest;
botID=knownBots[i];
}
}
}
if (isBotTest == true) {
log.info({botID: botID, botReq: botReq});
//send something to bots
} else {
log.info("We don't think this is one of those bots any more")
return next();
}
}
Oh, and currently it does not respond to the bot requests. If you want to do that, just add a res.render or res.send at the line that says //send something to bots

Related

Node.js REST API - URI Sanitizing?

I would like to require pages in my Node.js server based on the requested URI.
However I concern that this could be a severe security issue since user can inject some malicous chars into the url, something like ../../ and reach to my root server point and reveal all of the code.
So just like throwing a bottle of water to a big fire, I have eliminated the option to send . to the request.
This is not a silverbullet, probably :)
Maybe is there some standard/best practice/guide or keypoints about URI sanitizing in REST API based on Node.js?
Edit - here the code uses the require
// app.js
app.use(require('./services/router')(app));
// router.js middleware
function router(app) {
return function(req, res, next) {
try {
// checking for . in the url
if (req.url.indexOf(".")!=-1) cast.badRequest();
// req.url.split('/')[2] should be customers, users or anything else
require('../../resources/' + req.url.split('/')[2] + '/' + req.url.split('/')[2] + '-router')(app);
next();
} catch(err) { cast.notFound(); }
}
}
module.exports = router;
// rides-router.js (this could be users-router.js or customers-router.js)
module.exports = function(app) {
// GET ride - select a ride
app.get("/v1/rides/:id", dep.verifyToken(), require('./api/v1-get-ride'));
// POST ride - insert a new ride
app.post("/v1/rides", dep.verifyToken(), require('./api/v1-set-ride'));
app.use((req, res, next) => {
cast.notFound();
});
}
You asked how to do it safer. My recommendation is that you put all the resources in an array and run all the app.use() statements with one loop that pulls the resource names from the array at server startup.
I don't like running synchronous require() during a request and I don't like loading code based on user specified characters. Both are avoided with my recommendation.
// add routes for all resources
const resourceList = ['rides', 'products', ...];
for (let r of resourceList) {
app.use(`/${r}`, require(`./resources/${r}/${r}-router`));
}
This seems like less code and 100% safe and no running of synchronous require() during a request.
Advantages:
Fully whitelisted.
No user input involved in selecting code to run.
No synchronous require() during request processing.
All routes installed at server initialization time.
Any errors in route loading (like a missing route file) occur at server startup, not during a user request.

Express.js unique var per request outside routing

In my express application I have a module called helpers thats is required in almost all my routes and modules. This module has a logger method that logs to fluentd (but that's unimportant). While building the data to log I'd like to add a unique identifier of the request, so that all the logs written for the same request have the same unique ID. Using a global var in the app entry point app.use doesn't work because this var would be overwritten every time a new request hits, so the global uuid will change would obviously change in case of high load or long running tasks. The res.locals is not available outside routing, so I can't use it for this matter. Is there a way to create a var that would be unique per request and available in every module or maybe a way to access the res.locals data outside routing? Thank you
EDIT
Maybe an example will help understand better the question.
Suppose I have a module called helpers.js like this:
let helpers = {};
helpers.log = (logData, logName) => {
fluentLogger.emit('', {
name: logName,
//uuid: the needed uuid,
message: logData
});
}
module.exports = helpers;
Now obviously I can do this in my app.js entry point:
app.use(function (req, res, next) {
res.locals.uuid = uuid.v4();
next();
});
and then in every loaded middleware module that requires helpers(adding a new param to the helpers.log method):
const helpers = require('helpers');
router.post('/', (req, res, next) => {
helpers.log('my log message', 'myLogName', res.locals.uuid);
next();
});
and this will normally work. But suppose a big or middle size project where there are hundreds of custom modules and models (not middlewares) and a module may require other modules that require other modules that require finally the helpers module. In this case I should pass the res.locals.uuid as a parameter to every method of every method so that I have it available in the logger method. Not a very good idea. Suppose I have a new module called dbmodel.js that is required in a middleware function:
const helpers = require('helpers');
let dbmodel = {};
dbmodel.getSomeData = (someParam) => {
//some logic
helpers.log('my log message', 'myLogName');
}
module.exports = dbmodel;
The dbmodel has no idea about the res.locals data if I don't pass it from the middleware, so the helpers.log method will also have no idea about this.
In PHP one would normally write a GLOBAL var in the application's entry point so a hypothetical logger function would have access to this global on every method request from whichever class of the application.
Hope this explanation will help :) Thank you
EDIT 2
The solution for this kind of problems is CLS. Thanks to #robertklep for the hint. A good slideshare explaining exactly the same problem (logger with unique ID) and explaining the CLS solutions can be found here: https://www.slideshare.net/isharabash/cls-and-asynclistener
I answered a very similar question here which will solve this problem.
I used to solve the problem the libraries node-uuid and continuation-local-storage. Take a look to the answer of this question and see if it helps:
NodeJS Express - Global Unique Request Id
And you want a bigger explanation, take a look here:
Express.js: Logging info with global unique request ID – Node.js
Yes you can do so by one method .
Every request comes to his routes pass that request inside the middleware.
Suppose you have
app.get('/', function(req, res) {
res.sendFile(path.join(public + "index.html"));
});
a request.
Place Middleware in it .and edit req field coming , in this way you will get the unique variable values for each request
check out this .
https://expressjs.com/en/guide/writing-middleware.html
Like this
var requestTime = function (req, res, next) {
req.requestTime = Date.now()
next()
}
app.use(requestTime)
app.get('/', function (req, res) {
var responseText = 'Hello World!<br>'
responseText += '<small>Requested at: ' + req.requestTime + '</small>'
res.send(responseText)
})
Here req.requestTime is unique for each request.

Node Express auth status

I have multiple routes, split into different files (my app consists of different "modules", which I maintain in separate folders. For each folder, there is an index.js file in which I manage the routes per module, and I require these in the app.js file).
For every route, I will require to check the auth, and pass the loggedIn status to the header of every page:
//Default variables for the ejs template
var options = {
loggedIn: true
};
res.render("home/home", options);
If the logged in status is true, then the user's name will be displayed. If not, the login / signup labels are displayed.
What is the best way to centralise this, so that I don't need to require the auth script in every of these index.js (route) files?
I need to be able to pass the auth status to the view via the options object (see example).
In your auth, module, use a middleware function. That function can check and store res.locals.loggedIn which will be available for any view that will eventually be rendered. Just make sure the app.use call executes prior to your other routes and it will work properly.
app.use(function auth(req, res, next) {
res.locals.loggedIn = true; // compute proper value here
next();
});
From what I understand you need to do this for every request.One common thing is adding this as middleware so that all the request gets this .
For Example :
var http = require('http');
var connect = require('connect');
var app = connect();
app.use(function(req, res) {
res.end('Hello!');
});
http.createServer(app).listen(3000)
Now for every request , Hello is printed . You could extract this as a module and reuse it across projects. Check here for more details

Dispatch Express.js route based on first parameter?

I'm creating a CMS in node.js and Express. I allow users to create their own subsections in the site. A subsection can be a blog, a page or a forum. These sub-sections can be installed one level deep in the site url path, so for instance:
domain.com/custom-path-blog/
I would have to support the following url structure with express routes:
domain.com/custom-path-blog/ -> blog index
domain.com/custom-path-blog/page/5 -> list posts on page 5
domain.com/custom-path-blog/guides/ -> list posts that belong to guides category
domain.com/custom-path-blog/guides/this-is-a-post -> shows a post
I would also have to support other sub-sections with different url structures. I have to make a call to a database to check out what the first level in the url actually is before I can dispatch it to the appropriate route.
Since this is a saaas website I dont want to dynamically register the routes on my node process as I could end up having thousands of users with possibly millions of routes. This is not doable. I have to go to the database for that first chunk of information.
Once I know a sub section is a blog or a forum or a e-commerce store how do I send the url past that "custom-path-blog" to be processed by the appropriate express routing mechanism?
I'm starting to think this might be too complicated to do with express routes and I will have to do it by hand.
Thanks!
If you have already have 3 separated apps (page, blog, forum), and you want to launch it in 1 node process you can do this:
app.use('/page', pageApp);
app.use('/blog', blogApp);
app.use('/forum', forumApp);
express will strip out the first component of url for you.
In your case, the first component is customize by user, so you need to write a middleware for it:
function appSelector(req, res, next) {
var firstComponent = getFirtCompoent(req.url.pathname) // return page or blog ...
var userID = req.user.id;
detectAppForCurrentUser(firstCompoent, userID, function (type) {
if(type === 'page') {
removeFirstComponent(req);
return pageApp(req, res, next);
}
if(type === 'blog') {
removeFirstComponent(req);
return blogApp(req, res, next);
}
next(); // if not found continue with other routes
}
}
app.use(appSelector);
// TODO other routes here
there are many way to solve problem, but is it important rule: app.use, app.get are called on initialization phase only

NodeJS + Express: How to secure a URL

I am using latest versions of NodeJS and ExpressJS (for MVC).
I usually configure my rest paths like this, for example:
app.get('/archive', routes.archive);
Now i want my /admin/* set of URLs to be secured, I mean I need just simple authentication, it's just a draft.
When a user tries to access, for example, /admin/posts, before sending him the corresponding view and data, I check for a req.session.authenticated. If it's not defined, I redirect to the login page.
Login page has a simple validation form, and a sign-in controller method: if user does send "right user" and "right password" I set the session variable and he's authenticated.
What I find difficult, or I don't understand, is how to actually make the "filter" code, I mean, the auth check, before every /admin/* path call.
Does this have something to do with "middleware" express functions?
Thank you
Yep, middleware is exactly what you want. A middleware function is just a function that works just like any other Express route handler, expept it gets run before your actual route handler. You could, for example, do something like this:
function requireLogin(req, res, next) {
if (req.session.loggedIn) {
next(); // allow the next route to run
} else {
// require the user to log in
res.redirect("/login"); // or render a form, etc.
}
}
// Automatically apply the `requireLogin` middleware to all
// routes starting with `/admin`
app.all("/admin/*", requireLogin, function(req, res, next) {
next(); // if the middleware allowed us to get here,
// just move on to the next route handler
});
app.get("/admin/posts", function(req, res) {
// if we got here, the `app.all` call above has already
// ensured that the user is logged in
});
You could specify requireLogin as a middleware to each of the routes you want to be protected, instead of using the app.all call with /admin/*, but doing it the way I show here ensures that you can't accidentally forget to add it to any page that starts with /admin.
A even simpler approach would be to add the following code in the App.js file.
var auth = function(req, res, next) {
if(isAdmin) {
return next();
} else {
return res.status(400)
}
};
app.use('/admin', auth, apiDecrement);
As you can see the middleware is being attached to the route. Before ExpressJS goes forward, it executes the function that you passed as the second parameter.
With this solution you can make different checks before displaying the site to the end user.
Best.
Like brandon, but you can also go the connect route
app.use('/admin', requireLogin)
app.use(app.router)
app.get('/admin/posts', /* middleware */)

Resources