I use a third-party module (a spell-checker) that has a long initialization process, and calls a callback after the initialization is complete.
I need to use this module in several different files (sub-modules) of my application.
It looks like a waste of time and space to initialize a different spell-checker in each sub-modules, so I am looking for a way to initialize a single spell-checker and use it in all modules.
One option I thought of is to put a spell-checker instance in a sub-module, initialize it in the sub-module, and require that sub-module from my other sub-modules. But, I don't know how to deal with the initialization callback - how can I make sure that my other sub-modules won't use the spell-checker instance before it is initialized?
Another option I thought of is to create a separate application that with the spell-checker, and contact it from my application via TCP/IP or another mechanism. But this also looks wasteful - too much communication overhead.
Is there a better way?
This is analogous to using a database driver library and waiting for it to connect successfully to the database before issuing queries. The most prevalent pattern seems to be for the asynchronous library to emit an event such as 'connected' and the calling code to not start interacting with the library until that event fires. The other option would be to follow the example of something like mongoose and queue bending calls until the spell checker is initialized and then begin submitting them for processing.
So in short I would wrap the spell checking in a small library that exports the spell checker directly, but also emits a 'ready' event when the underlying spellchecker library invokes the initialization callback. It should be possible to share this same instance of the wrapper module throughout your application.
Create your own module, checker.js, where
var spellChecker = require('wordsworth').getInstance();
var initialized = false;
module.exports = function (callback) {
if (!initialized) {
return spellChecker.initialize(/* data */, function () {
initialized = true;
callback (spellChecker);
}
}
callback (spellChecker);
}
client.js
var checker = require('./checker);
checker(function (spellChecker) {
// use it..
});
So, it will be initialized on first call, the rest of clients will use initialized version.
Related
When using WebAssembly, there is a callback for onRuntimeInitialized(). You basically can't do anything until it happens.
So if you have a library that is implemented in it, you have to say:
var mylib = require('mylib')
mylib.onRuntimeInitialized = function() {
...
// Anything that wants to use *anything* from mylib
// (doesn't matter if it's synchronous or asynchronous)
...
}
On the plus side, you're not making Node wait to do any initialization that might not rely on mylib...so other modules can be doing fetches or whatever they need. On the negative side, it's pretty bad ergonomics--especially if everything you're doing depends on this library.
One possibility might seem to be to fold the initialization waiting into a promise, and then wait on it:
var mylib = require('mylib')
await mylib.Startup()
But people apparently write about how much they don't like the idea of top-level AWAIT. And my opinion on it is fairly irrelevant either way, as it's not allowed. :-/
So is there really no way to hold up code at the top level besides wrapping the whole app in a callback?
One thing to note with Node is that requires will return the same object, no matter what file that require is called in. Order does matter, but it will be the same object in all files.
So in your main index.js you could do something like
var myLib = require('mylib')
myLib.libby = myLib.initialize()
and then in another file, doesStuff.js, you can do:
const libby = require('mlib').libby
module.exports = function doStuff() {
/* do stuff with initialized libby object */
}
Typically the way this works is that call in doStuff.js is not called until everything is initialized and say the web route is handled. So your server is running already and so libby will be initialized and ready to use once it's called.
If you have something that absolutely cannot fail, like the server will not run if DB connection is not successful or something, then waiting is appropriate, so yes, you'd need to wrap everything (at least the core of your actions) in a callback so your server knows when it's safe to start.
I would like to store a variable that is shared between all stack frames (top down) in a call chain. Much like ThreadLocal in Java or C#.
I have found https://github.com/othiym23/node-continuation-local-storage but it keeps loosing context for all my use cases and it seems that you have to patch the libraries you are using to make it local-storage-aware which is more or less impossible for our code base.
Are there really not any other options available in Node? Could domains, stacktraces or something like that be used to get a handle (id) to the current call chain. If this is possible I can write my own thread-local implementation.
Yes, it is possible. Thomas Watson has spoken about it at NodeConf Oslo 2016 in his Instrumenting Node.js in Production (alt.link).
It uses Node.js tracing - AsyncWrap (which should eventually become a well-established part of the public Node API). You can see an example in the open-source Opbeat Node agent or, perhaps even better, check out the talk slides and example code.
Now that more than a year has passed since I originally asked this question, it finally looks like we have a working solution in the form of Async Hooks in Node.js 8.
https://nodejs.org/api/async_hooks.html
The API is still experimental, but even then it looks like there is already a fork of Continuation-Local-Storage that uses this new API internally.
https://www.npmjs.com/package/cls-hooked
TLS is used in some places where ordinary, single-threaded programs would use global variables but where this would be inappropriate in multithreaded cases.
Since javascript does not have exposed threads, global variable is the simplest answer to your question, but using one is a bad practice.
You should instead use a closure: just wrap all your asynchronous calls into a function and define your variable there.
Functions and callbacks created within closure
(function() (
var visibleToAll=0;
functionWithCallback( params, function(err,result) {
visibleToAll++;
// ...
anotherFunctionWithCallback( params, function(err,result) {
visibleToAll++
// ...
});
});
functionReturningPromise(params).then(function(result) {
visibleToAll++;
// ...
}).then(function(result) {
visibleToAll++;
// ...
});
))();
Functions created outside of closure
Should you require your variable to be visible inside functions not defined within request scope, you can create a context object instead and pass it to functions:
(function c() (
var ctx = { visibleToAll: 0 };
functionWithCallback( params, ctx, function(err,result) {
ctx.visibleToAll++;
// ...
anotherFunctionWithCallback( params, ctx, function(err,result) {
ctx.visibleToAll++
// ...
});
});
functionReturningPromise(params,ctx).then(function(result) {
ctx.visibleToAll++;
// ...
}).then(function(result) {
ctx.visibleToAll++;
// ...
});
))();
Using approach above all of your functions called inside c() get reference to same ctx object, but different calls to c() have their own contexts. In typical use case, c() would be your request handler.
Binding context to this
You could bind your context object to this in called functions by invoking them via Function.prototype.call:
functionWithCallback.call(ctx, ...)
...creating new function instance with Function.prototype.bind:
var boundFunctionWithCallback = functionWithCallback.bind(ctx)
...or using promise utility function like bluebird's .bind
Promise.bind(ctx, functionReturningPromise(data) ).then( ... )
Any of these would make ctx available inside your function as this:
this.visibleToAll ++;
...however it has no real advantage over passing context around - your function still has to be aware of context passed via this, and you could accidentally pollute global object should you ever call function without context.
I have a node toplevel myapp variable that contains some key application state - loggers, db handlers and some other data. The modules downstream in directory hierarchy need access to these data. How can I set up a key/value system in node to do that?
A highly upticked and accepted answer in Express: How to pass app-instance to routes from a different file? suggests using, in a lower level module
//in routes/index.js
var app = require("../app");
But this injects a hard-coded knowledge of the directory structure and file names which should be a bigger no-no jimho. Is there some other method, like something native in JavaScript? Nor do I relish the idea of declaring variables without var.
What is the node way of making a value available to objects created in lower scopes? (I am very much new to node and all-things-node aren't yet obvious to me)
Thanks a lot.
Since using node global (docs here) seems to be the solution that OP used, thought I'd add it as an official answer to collect my valuable points.
I strongly suggest that you namespace your variables, so something like
global.myApp.logger = { info here }
global.myApp.db = {
url: 'mongodb://localhost:27017/test',
connectOptions : {}
}
If you are in app.js and just want to allow access to it
global.myApp = this;
As always, use globals with care...
This is not really related to node but rather general software architecture decisions.
When you have a client and a server module/packages/classes (call them whichever way you like) one way is to define routines on the server module that takes as arguments whichever state data your client keeps on the 'global' scope, completes its tasks and reports back to the client with results.
This way, it is perfectly decoupled and you have a strict control of what data goes where.
Hope this helps :)
One way to do this is in an anonymous function - i.e. instead of returning an object with module.exports, return a function that returns an appropriate value.
So, let's say we want to pass var1 down to our two modules, ./module1.js and ./module2.js. This is how the module code would look:
module.exports = function(var1) {
return {
doSomething: function() { return var1; }
};
}
Then, we can call it like so:
var downstream = require('./module1')('This is var1');
Giving you exactly what you want.
I just created an empty module and installed it under node_modules as appglobals.js
// index.js
module.exports = {};
// package.json too is barebones
{ "name": "appGlobals" }
And then strut it around as without fearing refactoring in future:
var g = require("appglobals");
g.foo = "bar";
I wish it came built in as setter/getter, but the flexibility has to be admired.
(Now I only need to figure out how to package it for production)
I am new in node.js programming. I need to change behaviour of one function in existing node.js application (Haraka SMTP server).
What is the best practise for doing this? Should I use plugin or is there some another way how to overload one particular JS function in node.js app? Is this even possible?
Node's require caches loaded objects. Therefore you can override an object's function, do something, call the original function, and do something afterwards.
var fs = require('fs');
var origRenameSync = fs.renameSync;
fs.renameSync = function(oldPath, newPath) {
newPath += ".renamed";
origRenameSync.call(this, oldPath, newPath);
// do more here
};
This is a poor example, you should never change core libraries this way. You cannot foresee all side effects.
However, if you know what you do you can adopt existing libraries without changing them internally. It is a very flexible way to decorate functions.
In node I see variables initialized global inside modules are getting mixed up [changes done by one request affects the other] across requests.
For Ex:
a.js
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
res.end('Hi '+a);
}
module.exports.printName = printName;
index.js
//Assume all createServer stuffs are done and following function as a CB to createServer
function requestListener(req, res) {
var a = require('a');
a.printName(req, res);
}
As per my assumption, printName function exported from module 'a' is executed everytime a new request hits node and it will have different scope object everytime.
So, having something global inside a module wouldn't be affecting them across requests.
But I see that isn't the case. Can anyone explain how does node handle module exports of functions in specific [way it handles the scope of the cached module exports object] and how to overcome this shared global variables across requests within a module?
Edit [We do async task per request]:
With rapid requests in our live system. Which basically query redis and responds the request. We see wrong response mapped to wrong request (reply [stored in a global var in the module] of a redis look up wrongly mapped to diff req). And also we have some default values as global vars which can be overridden based on request params. Which also is getting screwed up
The first step to understanding what is happening is understanding what's happening behind the scenes. From a language standpoint, there's nothing special about node modules. The 'magic' comes from how node loads files from disk when you require.
When you call require, node either synchronously reads from disk or returns the module's cached exports object. When reading files, it follows a set of somewhat complex rules to determine exactly which file is read, but once it has a path:
Check if require.cache[moduleName] exists. If it does, return that and STOP.
code = fs.readFileSync(path).
Wrap (concatenate) code with the string (function (exports, require, module, __filename, __dirname) { ... });
eval your wrapped code and invoke the anonymous wrapper function.
var module = { exports: {} };
eval(code)(module.exports, require, module, path, pathMinusFilename);
Save module.exports as require.cache[moduleName].
The next time you require the same module, node simply returns the cached exports object. (This is a very good thing, because the initial loading process is slow and synchronous.)
So now you should be able to see:
Top-level code in a module is only executed once.
Since it is actually executed in an anonymous function:
'Global' variables aren't actually global (unless you explicitly assign to global or don't scope your variables with var)
This is how a module gets a local scope.
In your example, you require module a for each request, but you're actually sharing the same module scope across all requrests because of the module caching mechanism outlined above. Every call to printName shares the same a in its scope chain (even though printName itself gets a new scope on each invocation).
Now in the literal code you have in your question, this doesn't matter: you set a and then use it on the very next line. Control never leaves printName, so the fact that a is shared is irrelevant. My guess is your real code looks more like:
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
getSomethingFromRedis(function(result) {
res.end('Hi '+a);
});
}
module.exports.printName = printName;
Here we have a problem because control does leave printName. The callback eventually fires, but another request changed a in the meantime.
You probably want something more like this:
a.js
module.exports = function A() {
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
res.end('Hi '+a);
}
return {
printName: printName
};
}
index.js
var A = require('a');
function requestListener(req, res) {
var a = A();
a.printName(req, res);
}
This way, you get a fresh and independent scope inside of A for each request.
It really depends when in the process do you assign to name.
if between assigning the name to calling requestListener, there is an async method, then you we'll have "race conditions" (I.E. two threads changing the same object at the same time) even though node.js is single-threaded.
this is because node.js will start processing a new request while the async method is running in the background.
for example look at the following sequence:
request1 starts processing, sets name to 1
request1 calls an async function
node.js frees the process, and handles the next request in queue.
request2 starts processing, sets name to 2
request2 calls an async function
node.js frees the process, the async function for request 1 is done, so it calls the callback for this function.
request1 calls requestListener, however at this point name is already set to 2 and not 1.
dealing with Async function in Node.js is very similar to multi-threaded programming, you must take care to encapsulate your data. in general you should try to avoid using Global object, and if you do use them, they should be either: immutable or self-contained.
Global objects shouldn't be used to pass state between functions (which is what you are doing).
The solution to your problems should be to put the name global inside an object, the suggested places are inside the request object, which is passed to all most all functions in the request processing pipelie (this is what connect.js,express.js and all the middleware are doing), or within a session (see connect.js session middleware), which would allow you to persist data between different requests from the same user.
Modules were designed for run once and cache the module, that, combined with node's asynchronous nature means about 50% of the time res.end('Hi '+a) executes before a = name (because a is known).
Ultimately it boils down to one simple fact of JavaScript: global vars are evil. I would not use a global unless it never gets overridden by requests.