Is there any risk to building URLs manually? - node.js

In node.js, it's possible to create paths to files using the standard path lib, e.g.:
const path = require('path');
const myPath = path.join('directory', 'file');
Is there any risk to building paths manually instead? For instance:
const myPath = 'directory/file';
I'm guessing it should be fine because:
There's nothing specifically in the docs about this.
I've never used an OS (even Windows) that doesn't accept / as a path separator.

When your entire path is static, as in your myPath = 'directory/file' example, there is no distinct advantage that I know of.
However, if you're constructing a path dynamically, path.join and the like are essential.
One important reason is that, unlike string concatenation, path.join et al will prevent unexpected behavior when one of the arguments is not a string.
In JavaScript, bugs related to values being undefined are common. Suppose you have some code in your app that redirects users to their profile after login:
if (success) {
res.redirect(`/users/${params.username}`);
}
Looks great, right? Not so fast. It turns out that the username property is actually params.userName, so you're redirecting every user to /users/undefined. No error is thrown, and your unit tests don't check the redirect URL, so this bug gets deployed.
If you had done this instead:
res.redirect(path.join('/users', params.username));
...it would have thrown an ArgumentError error immediately, your unit tests would have failed, and the bug would never have been deployed.
Is this a naïve example? Yes. Would it be easy to check for undefined? Yes. But when you make a habit of using path.join you don't have to worry about that. You know, at the very least, that you'll never accidentally end up with an undefined or null or 123 in your path.
It's even more important on the server side. Suppose you have an app that handles file uploads and you want a user's uploads to go into a directory with their username for its name:
fs.mkdir(`images/${params.username}`, err => {
if (err) { throw new Error('Oops!'); }
fs.writeFile(`images/${params.username}/${params.filename}`, params.filedata, /* ... */);
});
Now you have every user's uploads being written to images/undefined. It could be a long time (in business terms) before someone notices this bug.

Not in my point of view, I'm actually using this a lot once I'm heavy user of babel to use import.

Related

What is the best way to separate your code in a Node.js application?

I'm working on a MEAN (Mongo Express.js Angular Node.js) CRUD application. I have it working but everything is in one .js file. The single source code file is quite large. I want to refactor the code so CRUD functionality is in different source code files. Reading through other posts, I've got a working model but am not sure it is the right way in Node using Mongo to get it done.
Here's the code so far:
<pre>
var express = require('express');
var bodyParser = require('body-parser');
var app = express();
var path = require('path');
var db;
var connect = 'mongodb://<<mddbconnect string>>;
const MongoClient = require('mongodb').MongoClient;
var ObjectID = require("mongodb").ObjectID;
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));
app.use(express.static(__dirname + '/'));
// viewed at http://localhost:<<port referecnes in app.listen>>
app.get('/', (req, res) => {
res.sendFile(path.join(__dirname + '/index.html'));
});
MongoClient.connect(connect, (err, database) => {
if (err) return console.log(err)
db = database
app.listen(3000, () => {
console.log('listening on 3000' + Date() );
// Here's the require for the search function in another source code file.
var searchroute = require('./serverSearch')(app, db);
})
})
//Handlers
The rest of the CRUD application functions with app.post, app.get. These are other functions I want to move into different source code files, like serverSearch.js.
</pre>
The code I separated right now is the search functionality which is inside of the MongoClient.connection function. This function has to successfully execute to make sure the variable 'db' is valid before passing both variables 'app' and 'db' to the the search function built out in the source code file serverSearch.js.
I could now build out my other CRUD functions in separate files in put them in the same area as 'var searchroute = require('./serverSearch)(app,db);
Is this the best way to separate code in a MEAN application where the main app and db vars need to be instantiated then passed to functions in other source code files?
What you are basically describing is modular coding heading towards "services" perhaps even micro-services. There are a few factors to keep in mind for your system. (I have no doubt that there are many other approaches to this btw). Basically in most NodeJS systems I have worked on (not all) I try to apply the following architecture in development and then bring over as much as possible I to production.
Create a directory under the main one. I usually use some type of name that points to the term functions. In this directory I maintain function and /or class files divided into categories. Function wrappers for DB would be held in DB functions. This file would only contain functions for the DB. Security functions in another file. Helper functions in another. Time manipulation in another. I am sure you get the idea. These are all wrapped in module exports
Now in any file in my project where say I would need DB and helpers I will start it by:
let nhelpers = require("helpfuncs");
let ndb = require("dbfuncs");
Obviously names are different.
And btw I divide all the NPM packages in the same way under an environment directory.
Maintaining that kind of structure allows you to maintain sane order over the code, logical chaining in any decent IDE, and having relevant methods show up in your IDE without having to remember every function name and all the methods within.
It also allows you to write an orderly system of micro-services making sure each part dies exactly what you want and allows for sane debugging.
It took me awhile to settle on this method and refine it.
It paid off for me. Hope this helps.
Edit to clarify for the OP:
When it comes to the process.env variables I became a great fan of dotenv https://www.npmjs.com/package/dotenv
This little package has saved me an incredible amount of headaches. Of course you will have to decide if you include it in production or not. I have seen arguments for both, but i think in a well set up AWS, Google, Azure environment (or in Docker of course) I am of the opinion it can safely be used.
A couple of caveats.
Do not leave your dotenv file in the root. Move it somewhere else in your directory structure. It is simple and I actually put it in the same directory as all my environment files and helper files.
Remember it is simply a text file. So an IDE will not pick up your specific env variables in chaining. (Unless someone knows of a trick which I would love to hear about)
If you put env variables like access info to your DB system or other sensitive stuff, HASH THEM FIRST, put the hash in your env and have a function in your code which specifically just does the hash to the string. Do not under any conditions leave sensitive information in your environment file without hashing it first.
The final Gatcha. These are not PHP magic globals which cannot be overwritten. If you lose track and overwrite one of those process.env variables in your code it will take the new value until you restart your node app and it reads from the dotenv file again. (But then again that is the rule with all environment variables not only user defined ones)
Any typos above excuse me. Done from my cell in the train.

How to statically analyse that a file is fit for importing?

I have CLI program that can be executed with a list of files that describe instructions, e.g.
node ./my-program.js ./instruction-1.js ./instruction-2.js ./instruction-3.js
This is how I am importing and validating that the target file is an instruction file:
const requireInstruction = (instructionFilePath) => {
const instruction = require(instructionFilePath)
if (!instruction.getInstruction) {
throw new Error('Not instruction file.');
}
return instruction;
};
The problem with this approach is that it will execute the file executes regardless of whether it matches the expected signature, i.e. if file contains a side action such as connecting to the database:
const mysql = require('mysql');
mysql.createConnection(..);
module.exports = mysql;
Not instruction file. will fire, I will ignore the file, but the side-action will remain in the background.
How to safely validate target file signature?
Worst case scenario, is there a conventional way to completely sandbox the require logic and kill the process if file is determined to be unsafe?
Worst case scenario, is there a conventional way to completely sandbox the require logic and kill the process if file is determined to be unsafe?
Move the check logic into a specific js file. Make it process.exit(0) when everything is fine, process.exit(1) when it s wrong.
In your current program, instead of loading the file via require, use child_process.exec to invoke your new file, giving it the required parameter to know which file to test.
In your updated program, bind the close event to know if the return code was 0 or 1.
If you need more information than 0 or 1, into the new js file which will load the instruction, print some JSON.stringified data to stdout (console.log), and retrieve then JSON.parse it in the callback of call to child_process.exec.
Alternatively, have you looked into AST processing ?
http://jointjs.com/demos/javascript-ast
It could help you to identify piece of code which are not embedded within an exported function.
(Note: I discussed this question with the author on IRC. There may be some context in my answer that isn't in the original question.)
Given that your scenario is purely about preventing against accidental inclusion of non-instruction files, rather than about preventing malicious behaviour, static analysis using something like Esprima will probably be sufficient.
One approach would be to require that every instruction file exports some kind of object with a name property, containing the name of the instruction file. As there's not really anything to put in there besides a string literal, you can be fairly certain that if you can't locate a name property through static analysis, the file is not an instruction file - even in a language like JavaScript that isn't fully statically analyzable.
For any readers of this thread that are trying to protect from malicious actors, rather than accidents - for example, when accepting untrusted code from users: you cannot sandbox or 'validate' JavaScript with Node.js alone (not with the vm module either), and the above solution will not work for you. You will need system-level containerization or virtualization to run this kind of code safely. There are no other options.

Where should I put custom errors in sails.js?

I was wondering what's the best practice and if I should create:
a directory in which declare statically all the errors my application uses, like api/errors/custom1Error
declare them directly inside the files
or put the files directly inside the dir that needs that error, like api/controller/error/formInvalidError
other options!?
A neat way of going about this would be to simply add the errors as custom responses under api/responses. This way even the invocation becomes pretty neat. Although the doc says you should add them directly in the responses directory, I'm sure there must be a way to nest them under, say, responses/errors. I'll try that out and post an update in a bit.
Alright, off a quick search, I couldn't find any way to nest the responses, but you can use a small workaround that's not quite as neat:
Create the responses/errors directory with all the custom error response handlers. Create a custom response and name it something like custom.js. Then specify the response name while calling res.custom().
I'm adding a short snippet just for illustration:
api/responses/custom.js:
var customErrors = {
customError1: require('./errors/customError1'),
customError2: require('./errors/customError2')
};
module.exports = function custom (errorName, data) {
var req = this.req;
var res = this.res;
if (customErrors[errorName]) return customErrors[errorName](req, res, data);
else return res.negotiate();
}
From the controller:
res.custom('authError', data);
If you don't need logical processing for different errors, you can do away with the whole errors/ directory and directly invoke the respective views from custom.js:
module.exports = function custom (viewName, data) {
var req = this.req;
var res = this.res;
return res.view('errors/' + viewName, data);//assuming you have error views in views/errors
}
(You should first check if the view exists. Find out how on the linked page.)
Although I'm using something like this for certain purposes (dividing routes and so on), there definitely should be a way to include response handlers defined in different directories. (Perhaps by reconfiguring some grunt task?) I'll try to find that out and update if I find any success.
Good luck!
Update
Okay, so I found that the responses hook adds all files to res without checking if they are directories. So adding a directory under responses results in a TypeError from lodash. I may be reading this wrong but I guess it's reasonable to conclude that currently it's not possible to add a directory there, so I guess you'll have to stick to one of the above solutions.

Best way to reuse a large translation file within Node / Express

I'm new to Node but I figured I'd jump right in and start converting a PHP app into Node/Express. It's a bilingual app that uses gettext with PO/MO files. I found a Node module called node-gettext. I'd rather not convert the PO files into another format right now, so it seems this library is my only option.
So my concern is that right now, before every page render, I'm doing something like this:
exports.home_index = function(req, res)
{
var gettext = require('node-gettext'),
gt = new gettext();
var fs = require('fs');
gt.textdomain('de');
var fileContents = fs.readFileSync('./locale/de.mo');
gt.addTextdomain('de', fileContents);
res.render(
'home/index.ejs',
{ gt: gt }
);
};
I'll also be using the translations in classes, so with how it's set up now I'd have to load the entire translation file again every time I want to translate something in another place.
The translation file is about 50k and I really don't like having to do file operations like this on every page load. In Node/Express, what would be the most efficient way to handle this (aside from a database)? Usually a user won't even be changing their language after the first time (if they're changing it from English).
EDIT:
Ok, I have no idea if this is a good approach, but it at least lets me reuse the translation file in other parts of the app without reloading it everywhere I need to get translated text.
In app.js:
var express = require('express'),
app = express(),
...
gettext = require('node-gettext'),
gt = new gettext();
Then, also in app.js, I create the variable app.locals.gt to contain the gettext/translation object, and I include my middleware function:
app.locals.gt = gt;
app.use(locale());
In my middleware file I have this:
mod
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
var fs = require('fs');
req.app.locals.gt.textdomain(lang);
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.gt.addTextdomain(lang, fileContents);
next();
};
};
It doesn't seem like a good idea to assign the loaded translation file to app, since depending on the current request that file will be one of two languages. If I assigned the loaded translation file to app instead of a request variable, can that mix up users' languages?
Anyway, I know there's got to be a better way of doing this.
The simplest option would be to do the following:
Add this in app.js:
var languageDomains = {};
Then modify your Middleware:
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs');
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
req.textdomain = req.app.locals.gt.textdomain(lang);
next();
};
};
By checking if the file has already been loaded you are preventing the action from happening multiple times, and the domain data will stay resident in the server's memory. The downside to the simplicity of this solution is that if you ever change the contents of your .mo files whilst the server is running, the changes wont be taken into account. However, this code could be extended to keep an eye on the mtime of the files, and reload accordingly, or make use of fs.watchFile — if required:
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs'), filename = './locales/' + lang + '.mo';
var fileContents = fs.readFileSync(filename);
fs.watchFile(filename, function (curr, prev) {
req.app.locals.gt.addTextdomain(lang, fs.readFileSync(filename));
});
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
Warning: It should also be noted that using sync versions of functions outside of server initialisation is not a good idea because it can freeze the thread. You'd be better off changing your sync loading to the async equivalent.
After the above changes, rather than passing gt to your template, you should be able to use req.textdomain instead. It seems that the gettext library supports a number of requests directly on each domain object, which means you hopefully don't need to refer to the global gt object on a per request basis (which will be changing it's default domain on each request):
Each domain supports:
getTranslation
getComment
setComment
setTranslation
deleteTranslation
compilePO
compileMO
Taken from here:
https://github.com/andris9/node-gettext/blob/e193c67fdee439ab9710441ffd9dd96d027317b9/lib/domain.js
update
A little bit of further clarity.
Once the server has loaded the file into memory the first time, it should remain there for all subsequent connections it receives (for any visitor/request) because it is stored globally and wont be garbage collected — unless you remove all references to the data, which would mean gettext would need to have some kind of unload/forget domain method.
Node is different to PHP in that its environment is shared and wraps its own HTTP server (if you are using something like Express), which means it is very easy to remember data globally as it has a constant environment that all the code is executed within. PHP is always executed after the HTTP server has received and dealt with the request (e.g. Apache). Each PHP response is then executed in its own separate run-time, which means you have to rely on databases, sessions and cache stores to share even simple information and most resources.
further optimisations
Obviously with the above you are constantly running translations on each page load. Which means the gettext library will still be using the translation data resident in memory, which will take up processing time. To get around this, it would be best to make sure your URLs have something that makes them unique for each different language i.e. my-page/en/ or my.page.fr or even jp.domain.co.uk/my-page and then enable some kind of full page caching using something like memcached or express-view-cache. However, once you start caching pages you need to make certain there aren't any regions that are user specific, if so, you need to start implement more complicated systems that are sensitive to these areas.
Remember: The golden rule of optimisation, don't do so before you need to... basically meaning I wouldn't worry about page caching until you know it's going to be an issue, but it is always worth bearing in mind what your options are, as it should shape your code design.
update 2
Just to illustrate a bit further on the behaviour of a server running in JavaScript, and how the global behaviour is not just a property of req.app, but in fact any object that is further up the scope chain.
So, as an example, instead of adding var languageDomains = {}; to your app.js, you could instantiate it further up the scope of wherever your middleware is placed. It's best to keep your global entities in one place however, so app.js is the better place, but this is just for illustration.
var languageDomains = {};
module.exports = function locale()
{
/// you can still access languageDomains here, and it will behave
/// globally for the entire server.
languageDomains[lang]
}
So basically, where-as with PHP, the entire code-base is re-executed on each request — so the languageDomains would be instantiated a-new each time — in Node the only part of the code to be re-executed is the code within locale() (because it is triggered as part of a new request). This function will still have a reference to the already existing and defined languageDomains via the scope chain. Because languageDomains is never reset (on a per request basis) it will behave globally.
Concurrent users
Node.js is single threaded. This means that in order for it to be concurrent i.e. handle multiple requests at the "same" time, you have to code your app in such a way that each little part can be executed very quickly and then slip into a waiting state, whilst another part of another request is dealt with.
This is the reason for the asynchronous and callback nature of Node, and the reason to avoid Sync calls whilst your app is running. Any one Sync request could halt or freeze execution of the thread and delay handling for all other requests. The reason why I state this is to give you a better idea of how multiple users might interact with your code (and global objects).
Basically once a request is being dealt with by your server, it is it's only focus, until that particular execution cycle ends i.e. your request handler stops calling other code that needs to run synchronously. Once that happens the next queued item is dealt with (a callback or something), this could be part of another request, or it could be the next part in the current request.

Is this method of static file serving safe in node.js? (potential security hole?)

I want to create the simplest node.js server to serve static files.
Here's what I came up with:
fs = require('fs');
server = require('http').createServer(function(req, res) {
res.end(fs.readFileSync(__dirname + '/public/' + req.url));
});
server.listen(8080);
Clearly this would map http://localhost:8080/index.html to project_dir/public/index.html, and similarly so for all other files.
My one concern is that someone could abuse this to access files outside of project_dir/public. Something like this, for example:
http://localhost:8080/../../sensitive_file.txt
I tried this a little bit, and it wasn't working. But, it seems like my browser was removing the ".." itself. Which leads me to believe that someone could abuse my poor little node.js server.
I know there are npm packages that do static file serving. But I'm actually curious to write my own here. So my questions are:
Is this safe?
If so, why? If not, why not?
And, if further, if not, what is the "right" way to do this? My one constraint is I don't want to have to have an if clause for each possible file, I want the server to serve whatever files I throw in a directory.
No, it's not safe.
For the reason you mention.
Use fs.realpath() or fs.realpathSync() and check that the normalized path points to where you want it to.
It is not safe so use something like that:
if (req.url.indexOf('..') !== -1) {
res.writeHead(404);
res.end();
}
PS: by the way, use readFile instead of readFileSync and watch for errors... your implementation is too bad even for an example -_-

Resources