Best way to reuse a large translation file within Node / Express - node.js

I'm new to Node but I figured I'd jump right in and start converting a PHP app into Node/Express. It's a bilingual app that uses gettext with PO/MO files. I found a Node module called node-gettext. I'd rather not convert the PO files into another format right now, so it seems this library is my only option.
So my concern is that right now, before every page render, I'm doing something like this:
exports.home_index = function(req, res)
{
var gettext = require('node-gettext'),
gt = new gettext();
var fs = require('fs');
gt.textdomain('de');
var fileContents = fs.readFileSync('./locale/de.mo');
gt.addTextdomain('de', fileContents);
res.render(
'home/index.ejs',
{ gt: gt }
);
};
I'll also be using the translations in classes, so with how it's set up now I'd have to load the entire translation file again every time I want to translate something in another place.
The translation file is about 50k and I really don't like having to do file operations like this on every page load. In Node/Express, what would be the most efficient way to handle this (aside from a database)? Usually a user won't even be changing their language after the first time (if they're changing it from English).
EDIT:
Ok, I have no idea if this is a good approach, but it at least lets me reuse the translation file in other parts of the app without reloading it everywhere I need to get translated text.
In app.js:
var express = require('express'),
app = express(),
...
gettext = require('node-gettext'),
gt = new gettext();
Then, also in app.js, I create the variable app.locals.gt to contain the gettext/translation object, and I include my middleware function:
app.locals.gt = gt;
app.use(locale());
In my middleware file I have this:
mod
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
var fs = require('fs');
req.app.locals.gt.textdomain(lang);
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.gt.addTextdomain(lang, fileContents);
next();
};
};
It doesn't seem like a good idea to assign the loaded translation file to app, since depending on the current request that file will be one of two languages. If I assigned the loaded translation file to app instead of a request variable, can that mix up users' languages?
Anyway, I know there's got to be a better way of doing this.

The simplest option would be to do the following:
Add this in app.js:
var languageDomains = {};
Then modify your Middleware:
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs');
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
req.textdomain = req.app.locals.gt.textdomain(lang);
next();
};
};
By checking if the file has already been loaded you are preventing the action from happening multiple times, and the domain data will stay resident in the server's memory. The downside to the simplicity of this solution is that if you ever change the contents of your .mo files whilst the server is running, the changes wont be taken into account. However, this code could be extended to keep an eye on the mtime of the files, and reload accordingly, or make use of fs.watchFile — if required:
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs'), filename = './locales/' + lang + '.mo';
var fileContents = fs.readFileSync(filename);
fs.watchFile(filename, function (curr, prev) {
req.app.locals.gt.addTextdomain(lang, fs.readFileSync(filename));
});
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
Warning: It should also be noted that using sync versions of functions outside of server initialisation is not a good idea because it can freeze the thread. You'd be better off changing your sync loading to the async equivalent.
After the above changes, rather than passing gt to your template, you should be able to use req.textdomain instead. It seems that the gettext library supports a number of requests directly on each domain object, which means you hopefully don't need to refer to the global gt object on a per request basis (which will be changing it's default domain on each request):
Each domain supports:
getTranslation
getComment
setComment
setTranslation
deleteTranslation
compilePO
compileMO
Taken from here:
https://github.com/andris9/node-gettext/blob/e193c67fdee439ab9710441ffd9dd96d027317b9/lib/domain.js
update
A little bit of further clarity.
Once the server has loaded the file into memory the first time, it should remain there for all subsequent connections it receives (for any visitor/request) because it is stored globally and wont be garbage collected — unless you remove all references to the data, which would mean gettext would need to have some kind of unload/forget domain method.
Node is different to PHP in that its environment is shared and wraps its own HTTP server (if you are using something like Express), which means it is very easy to remember data globally as it has a constant environment that all the code is executed within. PHP is always executed after the HTTP server has received and dealt with the request (e.g. Apache). Each PHP response is then executed in its own separate run-time, which means you have to rely on databases, sessions and cache stores to share even simple information and most resources.
further optimisations
Obviously with the above you are constantly running translations on each page load. Which means the gettext library will still be using the translation data resident in memory, which will take up processing time. To get around this, it would be best to make sure your URLs have something that makes them unique for each different language i.e. my-page/en/ or my.page.fr or even jp.domain.co.uk/my-page and then enable some kind of full page caching using something like memcached or express-view-cache. However, once you start caching pages you need to make certain there aren't any regions that are user specific, if so, you need to start implement more complicated systems that are sensitive to these areas.
Remember: The golden rule of optimisation, don't do so before you need to... basically meaning I wouldn't worry about page caching until you know it's going to be an issue, but it is always worth bearing in mind what your options are, as it should shape your code design.
update 2
Just to illustrate a bit further on the behaviour of a server running in JavaScript, and how the global behaviour is not just a property of req.app, but in fact any object that is further up the scope chain.
So, as an example, instead of adding var languageDomains = {}; to your app.js, you could instantiate it further up the scope of wherever your middleware is placed. It's best to keep your global entities in one place however, so app.js is the better place, but this is just for illustration.
var languageDomains = {};
module.exports = function locale()
{
/// you can still access languageDomains here, and it will behave
/// globally for the entire server.
languageDomains[lang]
}
So basically, where-as with PHP, the entire code-base is re-executed on each request — so the languageDomains would be instantiated a-new each time — in Node the only part of the code to be re-executed is the code within locale() (because it is triggered as part of a new request). This function will still have a reference to the already existing and defined languageDomains via the scope chain. Because languageDomains is never reset (on a per request basis) it will behave globally.
Concurrent users
Node.js is single threaded. This means that in order for it to be concurrent i.e. handle multiple requests at the "same" time, you have to code your app in such a way that each little part can be executed very quickly and then slip into a waiting state, whilst another part of another request is dealt with.
This is the reason for the asynchronous and callback nature of Node, and the reason to avoid Sync calls whilst your app is running. Any one Sync request could halt or freeze execution of the thread and delay handling for all other requests. The reason why I state this is to give you a better idea of how multiple users might interact with your code (and global objects).
Basically once a request is being dealt with by your server, it is it's only focus, until that particular execution cycle ends i.e. your request handler stops calling other code that needs to run synchronously. Once that happens the next queued item is dealt with (a callback or something), this could be part of another request, or it could be the next part in the current request.

Related

Would the values inside request be mixed up in callback?

I am new to Node.js, and I have been reading questions and answers related with this issue, but still not very sure if I fully understand the concept in my case.
Suggested Code
router.post('/test123', function(req, res) {
someAsyncFunction1(parameter1, function(result1) {
someAsyncFunction2(parameter2, function(result2) {
someAsyncFunction3(parameter3, function(result3) {
var theVariable1 = req.body.something1;
var theVariable2 = req.body.something2;
)}
)}
});
Question
I assume there will be multiple (can be 10+, 100+, or whatever) requests to one certain place (for example, ajax request to /test123, as shown above) at the same time with some variables (something1 and something2). According to this, it would be impossible that one user's theVariable1 and theVariable2 are mixed up with (i.e, overwritten by) the other user's req.body.something1 and req.body.something2. I am wondering if this is true when there are multiple callbacks (three like the above, or ten, just in case).
And, I also consider using res.locals to save some data from callbacks (instead of using theVariable1 and theVariable2, but is it good idea to do so given that the data will not be overwritten due to multiple simultaneous requests from clients?
Each request an Node.js/Express server gets generated a new req object.
So in the line router.post('/test123', function(req, res), the req object that's being passed in as an argument is unique to that HTTP connection.
You don't have to worry about multiple functions or callbacks. In a traditional application, if I have two objects cat and dog that I can pass to the listen function, I would get back meow and bark. Even though there's only one listen function. That's sort of how you can view an Express app. Even though you have all these get and post functions, every user's request is passed to them as a unique entity.

What is the best way to separate your code in a Node.js application?

I'm working on a MEAN (Mongo Express.js Angular Node.js) CRUD application. I have it working but everything is in one .js file. The single source code file is quite large. I want to refactor the code so CRUD functionality is in different source code files. Reading through other posts, I've got a working model but am not sure it is the right way in Node using Mongo to get it done.
Here's the code so far:
<pre>
var express = require('express');
var bodyParser = require('body-parser');
var app = express();
var path = require('path');
var db;
var connect = 'mongodb://<<mddbconnect string>>;
const MongoClient = require('mongodb').MongoClient;
var ObjectID = require("mongodb").ObjectID;
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));
app.use(express.static(__dirname + '/'));
// viewed at http://localhost:<<port referecnes in app.listen>>
app.get('/', (req, res) => {
res.sendFile(path.join(__dirname + '/index.html'));
});
MongoClient.connect(connect, (err, database) => {
if (err) return console.log(err)
db = database
app.listen(3000, () => {
console.log('listening on 3000' + Date() );
// Here's the require for the search function in another source code file.
var searchroute = require('./serverSearch')(app, db);
})
})
//Handlers
The rest of the CRUD application functions with app.post, app.get. These are other functions I want to move into different source code files, like serverSearch.js.
</pre>
The code I separated right now is the search functionality which is inside of the MongoClient.connection function. This function has to successfully execute to make sure the variable 'db' is valid before passing both variables 'app' and 'db' to the the search function built out in the source code file serverSearch.js.
I could now build out my other CRUD functions in separate files in put them in the same area as 'var searchroute = require('./serverSearch)(app,db);
Is this the best way to separate code in a MEAN application where the main app and db vars need to be instantiated then passed to functions in other source code files?
What you are basically describing is modular coding heading towards "services" perhaps even micro-services. There are a few factors to keep in mind for your system. (I have no doubt that there are many other approaches to this btw). Basically in most NodeJS systems I have worked on (not all) I try to apply the following architecture in development and then bring over as much as possible I to production.
Create a directory under the main one. I usually use some type of name that points to the term functions. In this directory I maintain function and /or class files divided into categories. Function wrappers for DB would be held in DB functions. This file would only contain functions for the DB. Security functions in another file. Helper functions in another. Time manipulation in another. I am sure you get the idea. These are all wrapped in module exports
Now in any file in my project where say I would need DB and helpers I will start it by:
let nhelpers = require("helpfuncs");
let ndb = require("dbfuncs");
Obviously names are different.
And btw I divide all the NPM packages in the same way under an environment directory.
Maintaining that kind of structure allows you to maintain sane order over the code, logical chaining in any decent IDE, and having relevant methods show up in your IDE without having to remember every function name and all the methods within.
It also allows you to write an orderly system of micro-services making sure each part dies exactly what you want and allows for sane debugging.
It took me awhile to settle on this method and refine it.
It paid off for me. Hope this helps.
Edit to clarify for the OP:
When it comes to the process.env variables I became a great fan of dotenv https://www.npmjs.com/package/dotenv
This little package has saved me an incredible amount of headaches. Of course you will have to decide if you include it in production or not. I have seen arguments for both, but i think in a well set up AWS, Google, Azure environment (or in Docker of course) I am of the opinion it can safely be used.
A couple of caveats.
Do not leave your dotenv file in the root. Move it somewhere else in your directory structure. It is simple and I actually put it in the same directory as all my environment files and helper files.
Remember it is simply a text file. So an IDE will not pick up your specific env variables in chaining. (Unless someone knows of a trick which I would love to hear about)
If you put env variables like access info to your DB system or other sensitive stuff, HASH THEM FIRST, put the hash in your env and have a function in your code which specifically just does the hash to the string. Do not under any conditions leave sensitive information in your environment file without hashing it first.
The final Gatcha. These are not PHP magic globals which cannot be overwritten. If you lose track and overwrite one of those process.env variables in your code it will take the new value until you restart your node app and it reads from the dotenv file again. (But then again that is the rule with all environment variables not only user defined ones)
Any typos above excuse me. Done from my cell in the train.

sails.js Use session param in model

This is an extension of this question.
In my models, every one requires a companyId to be set on creation and every one requires models to be filtered by the same session held companyid.
With sails.js, I have read and understand that session is not available in the model unless I inject it using the controller, however this would require me to code all my controller/actions with something very, very repetitive. Unfortunate.
I like sails.js and want to make the switch, but can anyone describe to me a better way? I'm hoping I have just missed something.
So, if I understand you correctly, you want to avoid lots of code like this in your controllers:
SomeModel.create({companyId: req.session.companyId, ...})
SomeModel.find({companyId: req.session.companyId, ...})
Fair enough. Maybe you're concerned that companyId will be renamed in the future, or need to be further processed. The simplest solution if you're using custom controller actions would be to make class methods for your models that accept the request as an argument:
SomeModel.doCreate(req, ...);
SomeModel.doFind(req, ...);
On the other hand, if you're on v0.10.x and you can use blueprints for some CRUD actions, you will benefit from the ability to override the blueprints with your own code, so that all of your creates and finds automatically use the companyId from the session.
If you're coming from a non-Node background, this might all induce some head-scratching. "Why can't you just make the session available everywhere?" you might ask. "LIKE THEY DO IN PHP!"
The reason is that PHP is stateless--every request that comes in gets essentially a fresh copy of the app, with nothing in memory being shared between requests. This means that any global variables will be valid for the life of a single request only. That wonderful $_SESSION hash is yours and yours alone, and once the request is processed, it disappears.
Contrast this with Node apps, which essentially run in a single process. Any global variables you set would be shared between every request that comes in, and since requests are handled asynchronously, there's no guarantee that one request will finish before another starts. So a scenario like this could easily occur:
Request A comes in.
Sails acquires the session for Request A and stores it in the global $_SESSION object.
Request A calls SomeModel.find(), which calls out to a database asynchronously
While the database does its magic, Request A surrenders its control of the Node thread
Request B comes in.
Sails acquires the session for Request B and stores it in the global $_SESSION object.
Request B surrenders its control of the thread to do some other asynchronous call.
Request A comes back with the result of its database call, and reads something from the $_SESSION object.
You can see the issue here--Request A now has the wrong session data. This is the reason why the session object lives inside the request object, and why it needs to be passed around to any code that wants to use it. Trying too hard to circumvent this will inevitably lead to trouble.
Best option I can think of is to take advantage of JS, and make some globally accessible functions.
But its gonna have a code smell :(
I prefer to make a policy that add the companyId inside the body.param like this:
// Needs to be Logged
module.exports = function(req, res, next) {
sails.log.verbose('[Policy.insertCompanyId() called] ' + __filename);
if (req.session) {
req.body.user = req.session.companyId;
//or something like AuthService.getCompanyId(req.session);
return next();
}
var err = 'Missing companyId';
//log ...
return res.redirect(307, '/');
};

How to make my nodejs app serve multiple users?

I am implementing a very basic website using nodejs, and expressjs framework. The idea is that the user enters the website, click on a button that will trigger a cpu-intensive task in the server, and get a message on the browser upon the completion of the task.
The confusing part for me is how to accomplish this task for each user without blocking the event-loop, thus, no user has to wait for another to finish. Also, how to use a new instance of the used resources(objects, variables..) for each user.
After playing and reading around, I have come across child-process. Basically, I thought of forking a new child_process for each user so that whatever the time the sorting takes, it won't block my event-loop. However, I am still not sure if this is the best thing to do for such a scenario.
Now, I have done what I wanted to but with only single user, however when trying to start another user, things become messy and variables are shared. I know that I should not use global variables declared in the module, but what could be another way to make variables shared among functions within a single module yet they are different for each user!?
I know that the question may sound very basic, but I kinda miss the idea of how does node js serve different users with new variables, objects that are associated with each individual user.
In short, my questions are:
1- how does node serve multiple users simultaneously?
2- when and how should I resort to forking or executing a new child-process under the hood, and is it for each user or based on my # cores in cpu
3- how to separate resources in my application for each user such that each user has his own counters, emails and other objects and variables.
4- when do I need or I have to kill my child process.
Thanks in advance.
CODE:
var cp = require('child_process');
var child= cp.fork('./routes/mysort-module');
exports.user=function(req,res){
// child = new cp.fork('./routes/mysort-module'); // Should I make new child for each user?
child.on('message',function(m){ res.send('DONE');}
child.send('START PROCESS'); // Forking a cpu-intensive task in mysort-module.js
}
IN MY SORTING MODULE:
var variables = require(...);
//GLOBAL VARIABLES to be shared among different functions in this module
process.on('message',function(m){
//sort cpu-intensive task
process.send('DONE');
});
// OTHER FUNCTIONS in THE MODULE that make use of global variables.
You should try to split up your question. However, I hope this answers it.
Question 1: A global variable is not limited to request scope. That's a part of Node's definition for a global and it doesn't make sense to enforce this somehow. You shouldn't use globals at all.
The request scope is given by the HTTP module:
http.createServer(function(req, res) {
var a = 1;
// req, res, and a are in request scope for your user-associated response
});
Eliminating globals shouldn't be that hard: If module A and B share a global G and module C calls A.doThis() and B.doThat(), change C to call A.doThis(G) and B.doThat(G) instead. Do this for all occurrences of G and reduce its scope to local or request.
Additionally, have a look for "Sessions", if you need a scope coverig multiple requests from one client.
Question 2: Start the child process inside the request handler:
exports.user = function(req,res){
child = new cp.fork('./routes/mysort-module');
Question 3: See question 1?
Question 4: After the process returned the calculated results.
process.on('DONE', function(result) {
process.exit();
// go on and send result to the client somehow...
});

Organizing routes without pre loading the modules in nodejs express

Currently the routing in Express framework requires the module to be loaded before. But that would not be efficient in real life scenarios where there are hundreds of module. I would like to load only the module that is needed. Is there a way where I can define a route to a target module without pre loading the module.
Something like this
app.get('user/showall', 'user.list');
So I would like user module to be loaded only when that particular request needs it to be loaded.
I would rather have a slow startup with fast request handling than fast startup with slow request handling because modules have to be loaded at runtime.
But if you really want, you could create a middleware to implement such behaviour (totally untested):
var lazyload = function(route) {
var s = route.split('.');
var mod = s[0];
var exp = s[1];
return function(req, res, next) {
require(mod)[exp](req, res);
};
};
...
app.get('user/showall', lazyload('user.list'));
(this assumes that routes are always named MODULENAME.EXPORTEDNAME).
I second what #robertklep said "I would rather have a slow startup with fast request handling than fast startup with slow request handling".
But I strongly recommend against doing a require to handle a request because the first call is synchronous and will block the server, which not only affects the current request, but prevents any other request from being processed. With enough such requests, and your server will stop answering requests.
Basically: preload all the code you will need, but lazy load data (in an asynchronous fashion).
(This is not what you are asking for, but this would be considered bad practice).

Resources