Is it possible to have "thread" local variables in Node? - node.js

I would like to store a variable that is shared between all stack frames (top down) in a call chain. Much like ThreadLocal in Java or C#.
I have found https://github.com/othiym23/node-continuation-local-storage but it keeps loosing context for all my use cases and it seems that you have to patch the libraries you are using to make it local-storage-aware which is more or less impossible for our code base.
Are there really not any other options available in Node? Could domains, stacktraces or something like that be used to get a handle (id) to the current call chain. If this is possible I can write my own thread-local implementation.

Yes, it is possible. Thomas Watson has spoken about it at NodeConf Oslo 2016 in his Instrumenting Node.js in Production (alt.link).
It uses Node.js tracing - AsyncWrap (which should eventually become a well-established part of the public Node API). You can see an example in the open-source Opbeat Node agent or, perhaps even better, check out the talk slides and example code.

Now that more than a year has passed since I originally asked this question, it finally looks like we have a working solution in the form of Async Hooks in Node.js 8.
https://nodejs.org/api/async_hooks.html
The API is still experimental, but even then it looks like there is already a fork of Continuation-Local-Storage that uses this new API internally.
https://www.npmjs.com/package/cls-hooked

TLS is used in some places where ordinary, single-threaded programs would use global variables but where this would be inappropriate in multithreaded cases.
Since javascript does not have exposed threads, global variable is the simplest answer to your question, but using one is a bad practice.
You should instead use a closure: just wrap all your asynchronous calls into a function and define your variable there.
Functions and callbacks created within closure
(function() (
var visibleToAll=0;
functionWithCallback( params, function(err,result) {
visibleToAll++;
// ...
anotherFunctionWithCallback( params, function(err,result) {
visibleToAll++
// ...
});
});
functionReturningPromise(params).then(function(result) {
visibleToAll++;
// ...
}).then(function(result) {
visibleToAll++;
// ...
});
))();
Functions created outside of closure
Should you require your variable to be visible inside functions not defined within request scope, you can create a context object instead and pass it to functions:
(function c() (
var ctx = { visibleToAll: 0 };
functionWithCallback( params, ctx, function(err,result) {
ctx.visibleToAll++;
// ...
anotherFunctionWithCallback( params, ctx, function(err,result) {
ctx.visibleToAll++
// ...
});
});
functionReturningPromise(params,ctx).then(function(result) {
ctx.visibleToAll++;
// ...
}).then(function(result) {
ctx.visibleToAll++;
// ...
});
))();
Using approach above all of your functions called inside c() get reference to same ctx object, but different calls to c() have their own contexts. In typical use case, c() would be your request handler.
Binding context to this
You could bind your context object to this in called functions by invoking them via Function.prototype.call:
functionWithCallback.call(ctx, ...)
...creating new function instance with Function.prototype.bind:
var boundFunctionWithCallback = functionWithCallback.bind(ctx)
...or using promise utility function like bluebird's .bind
Promise.bind(ctx, functionReturningPromise(data) ).then( ... )
Any of these would make ctx available inside your function as this:
this.visibleToAll ++;
...however it has no real advantage over passing context around - your function still has to be aware of context passed via this, and you could accidentally pollute global object should you ever call function without context.

Related

Why does node prefer error-first callback?

Node programmers conventionally use a paradigm like this:
let callback = function(err, data) {
if (err) { /* do something if there was an error */ }
/* other logic here */
};
Why not simplify the function to accept only a single parameter, which is either an error, or the response?
let callback = function(data) {
if (isError(data)) { /* do something if there was an error */ }
/* other logic here */
};
Seems simpler. The only downside I can see is that functions can't return errors as their actual intended return value - but I believe that is an incredibly insignificant use-case.
Why is the error-first pattern considered standard?
EDIT: Implementation of isError:
let isError = obj => obj instanceof Error;
ANOTHER EDIT: Is it possible that my alternate method is somewhat more convenient than node convention, because callbacks which only accept one parameter are more likely to be reusable for non-callback use-cases as well?
(See "Update" below for an npm module to use the callback convention from the question.)
This is just a convention. Node could use the convention that you suggest as well - with the exception that you wouldn't be able to return an error object as an intended value on success as you noticed, which may or may not be a problem, depending on your particular requirements.
The thing with the current Node convention is that sometimes the callbacks may not expect any data and the err is the only parameter that they take, and sometimes the functions expect more than one value on success - for example see
request(url, (err, res, data) => {
if (err) {
// you have error
} else {
// you have both res and data
}
});
See this answer for a full example of the above code.
But you might as well make the first parameter to be an error even in functions that take more than one parameter, I don't see any issue with your style even then.
The error-first Node-style callbacks is what was originally used by Ryan Dahl and it is now pretty universal and expected for any asynchronous functions that take callbacks. Not that this convention is better than what you suggest or worse, but having a convention - whatever it is - make the composition of callbacks and callback taking functions possible, and modules like async rely on that.
In fact, I see one way in which your idea is superior to the classical Node convention - it's impossible to call the callback with both error and the first non-error argument defined, which is possible for Node style callbacks and sometimes can happen. Both conventions could potentially have the callback called twice though - which is a problem.
But there is another widely used convention in JavaScript in general and Node in particular, where it's impossible to define both error and data and additionally it's impossible to call the callback twice - instead of taking a callback you return a promise and instead of explicitly checking the error value in if as is the case in Node-style callbacks or your style callbacks, you can separately add success and failure callbacks that only get relevant data.
All of those styles are pretty much equivalent in what they can do:
nodeStyle(params, function (err, data) {
if (err) {
// error
} else {
// success
}
};
yourStyle(params, function (data) {
if (isError(data)) {
// error
} else {
// success
}
};
promiseStyle(params)
.then(function (data) {
// success
})
.catch(function (err) {
// error
});
Promises may be more convenient for your needs and those are already widely supported with a lot of tools to use them, like Bluebird and others.
You can see some other answers where I explain the difference between callbacks and promises and how to use the together in more detail, which may be helpful to you in this case:
A detailed explanation on how to use callbacks and promises
Explanation on how to use promises in complex request handlers
An explanation of what a promise really is, on the example of AJAX requests
Examples of mixing callbacks with promises
Of course I see no reason why you couldn't write a module that converts Node-style callbacks into your style callbacks or vice versa, and the same with promises, much like promisify and asCallback work in Bluebird. It certainly seems doable if working with your callback style is more convenient for you.
Update
I just published a module on npm that you can use to have your preferred style of callbacks:
https://www.npmjs.com/package/errc
You can install it and use in your project with:
npm install errc --save
It allows you to have a code like this:
var errc = require('errc');
var fs = require('fs');
var isError = function(obj) {
try { return obj instanceof Error; } catch(e) {}
return false;
};
var callback = function(data) {
if (isError(data)) {
console.log('Error:', data.message);
} else {
console.log('Success:', data);
}
};
fs.readFile('example.txt', errc(callback));
For more examples see:
https://github.com/rsp/node-errc-example
I wrote this module as an example of how to manipulate functions and callbacks to suit your needs, but I released it under the MIT license and published on npm so you can use it in real projects if you want.
This demonstrates the flexibility of Node, its callback model and the possibility to write higher-order functions to create your own APIs that suit your needs. I publish it in hope that it may be useful as an example to understand the Node callback style.
Because without this convention, developers would have to maintain different signatures and APIs, without knowing where to place the error in the arguments array.
In most cases, there can be many arguments, but only one error - and you know where to find it.
Joyent even wrote about this at the time they were more involved:
Callbacks are the most basic way of delivering an event
asynchronously. The user passes you a function (the callback), and you
invoke it sometime later when the asynchronous operation completes.
The usual pattern is that the callback is invoked as callback(err,
result), where only one of err and result is non-null, depending on
whether the operation succeeded or failed.
Yeah we can develop code style as you said. But there would be some problems.If we maintain code style what we want , different signatures of API increases and of course there would be dilemma between developers. They create their layers( error and success stages for example) again. Common conventions play an important role in spreading best practices among developers.
Ingenral, the error-first Node-style callbacks is what was originally used by Ryan Dahl and it is now pretty universal and expected for any asynchronous functions that take callbacks. Not that this convention is better than what you suggest or worse, but having a convention - whatever it is - make the composition of callbacks and callback taking functions possible, and modules like async rely on that.

Global variable with initialization callback

I use a third-party module (a spell-checker) that has a long initialization process, and calls a callback after the initialization is complete.
I need to use this module in several different files (sub-modules) of my application.
It looks like a waste of time and space to initialize a different spell-checker in each sub-modules, so I am looking for a way to initialize a single spell-checker and use it in all modules.
One option I thought of is to put a spell-checker instance in a sub-module, initialize it in the sub-module, and require that sub-module from my other sub-modules. But, I don't know how to deal with the initialization callback - how can I make sure that my other sub-modules won't use the spell-checker instance before it is initialized?
Another option I thought of is to create a separate application that with the spell-checker, and contact it from my application via TCP/IP or another mechanism. But this also looks wasteful - too much communication overhead.
Is there a better way?
This is analogous to using a database driver library and waiting for it to connect successfully to the database before issuing queries. The most prevalent pattern seems to be for the asynchronous library to emit an event such as 'connected' and the calling code to not start interacting with the library until that event fires. The other option would be to follow the example of something like mongoose and queue bending calls until the spell checker is initialized and then begin submitting them for processing.
So in short I would wrap the spell checking in a small library that exports the spell checker directly, but also emits a 'ready' event when the underlying spellchecker library invokes the initialization callback. It should be possible to share this same instance of the wrapper module throughout your application.
Create your own module, checker.js, where
var spellChecker = require('wordsworth').getInstance();
var initialized = false;
module.exports = function (callback) {
if (!initialized) {
return spellChecker.initialize(/* data */, function () {
initialized = true;
callback (spellChecker);
}
}
callback (spellChecker);
}
client.js
var checker = require('./checker);
checker(function (spellChecker) {
// use it..
});
So, it will be initialized on first call, the rest of clients will use initialized version.

Node js globals within modules

In node I see variables initialized global inside modules are getting mixed up [changes done by one request affects the other] across requests.
For Ex:
a.js
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
res.end('Hi '+a);
}
module.exports.printName = printName;
index.js
//Assume all createServer stuffs are done and following function as a CB to createServer
function requestListener(req, res) {
var a = require('a');
a.printName(req, res);
}
As per my assumption, printName function exported from module 'a' is executed everytime a new request hits node and it will have different scope object everytime.
So, having something global inside a module wouldn't be affecting them across requests.
But I see that isn't the case. Can anyone explain how does node handle module exports of functions in specific [way it handles the scope of the cached module exports object] and how to overcome this shared global variables across requests within a module?
Edit [We do async task per request]:
With rapid requests in our live system. Which basically query redis and responds the request. We see wrong response mapped to wrong request (reply [stored in a global var in the module] of a redis look up wrongly mapped to diff req). And also we have some default values as global vars which can be overridden based on request params. Which also is getting screwed up
The first step to understanding what is happening is understanding what's happening behind the scenes. From a language standpoint, there's nothing special about node modules. The 'magic' comes from how node loads files from disk when you require.
When you call require, node either synchronously reads from disk or returns the module's cached exports object. When reading files, it follows a set of somewhat complex rules to determine exactly which file is read, but once it has a path:
Check if require.cache[moduleName] exists. If it does, return that and STOP.
code = fs.readFileSync(path).
Wrap (concatenate) code with the string (function (exports, require, module, __filename, __dirname) { ... });
eval your wrapped code and invoke the anonymous wrapper function.
var module = { exports: {} };
eval(code)(module.exports, require, module, path, pathMinusFilename);
Save module.exports as require.cache[moduleName].
The next time you require the same module, node simply returns the cached exports object. (This is a very good thing, because the initial loading process is slow and synchronous.)
So now you should be able to see:
Top-level code in a module is only executed once.
Since it is actually executed in an anonymous function:
'Global' variables aren't actually global (unless you explicitly assign to global or don't scope your variables with var)
This is how a module gets a local scope.
In your example, you require module a for each request, but you're actually sharing the same module scope across all requrests because of the module caching mechanism outlined above. Every call to printName shares the same a in its scope chain (even though printName itself gets a new scope on each invocation).
Now in the literal code you have in your question, this doesn't matter: you set a and then use it on the very next line. Control never leaves printName, so the fact that a is shared is irrelevant. My guess is your real code looks more like:
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
getSomethingFromRedis(function(result) {
res.end('Hi '+a);
});
}
module.exports.printName = printName;
Here we have a problem because control does leave printName. The callback eventually fires, but another request changed a in the meantime.
You probably want something more like this:
a.js
module.exports = function A() {
var a;
function printName(req, res) {
//get param `name` from url;
a = name;
res.end('Hi '+a);
}
return {
printName: printName
};
}
index.js
var A = require('a');
function requestListener(req, res) {
var a = A();
a.printName(req, res);
}
This way, you get a fresh and independent scope inside of A for each request.
It really depends when in the process do you assign to name.
if between assigning the name to calling requestListener, there is an async method, then you we'll have "race conditions" (I.E. two threads changing the same object at the same time) even though node.js is single-threaded.
this is because node.js will start processing a new request while the async method is running in the background.
for example look at the following sequence:
request1 starts processing, sets name to 1
request1 calls an async function
node.js frees the process, and handles the next request in queue.
request2 starts processing, sets name to 2
request2 calls an async function
node.js frees the process, the async function for request 1 is done, so it calls the callback for this function.
request1 calls requestListener, however at this point name is already set to 2 and not 1.
dealing with Async function in Node.js is very similar to multi-threaded programming, you must take care to encapsulate your data. in general you should try to avoid using Global object, and if you do use them, they should be either: immutable or self-contained.
Global objects shouldn't be used to pass state between functions (which is what you are doing).
The solution to your problems should be to put the name global inside an object, the suggested places are inside the request object, which is passed to all most all functions in the request processing pipelie (this is what connect.js,express.js and all the middleware are doing), or within a session (see connect.js session middleware), which would allow you to persist data between different requests from the same user.
Modules were designed for run once and cache the module, that, combined with node's asynchronous nature means about 50% of the time res.end('Hi '+a) executes before a = name (because a is known).
Ultimately it boils down to one simple fact of JavaScript: global vars are evil. I would not use a global unless it never gets overridden by requests.

Models with dependency injection in Nodejs

What is the best practice for injecting dependencies into models? And especially, what if their getter are asynchronous, as with mongodb.getCollection()?
The point is to inject dependencies once with
var model = require('./model')({dep1: foo, dep2: bar});
and call all member methods without having to pass them as arguments. Neither do I want to have each method to begin with a waterfall of async getters.
I ended up with a dedicated exports wrapper that proxies all calls and passes the async dependencies.
However, this creates a lot of overhead, it's repetitive a lot and I generally do not like it.
var Entity = require('./entity');
function findById(id, callback, collection) {
// ...
// callback(null, Entity(...));
};
module.exports = function(di) {
function getCollection(callback) {
di.database.collection('users', callback);
};
return {
findById: function(id, callback) {
getCollection(function(err, collection) {
findById(id, callback, collection);
});
},
// ... more methods, all expecting `collection`
};
};
What is the best practice for injecting dependencies, especially those with async getters?
If your need is to support unit testing, dependency injection in a dynamic language like javascript is probably more trouble than it's worth. Note that just about none of the modules you require from others are likely to use the patterns for DI you see in Java, .NET, and with other statically compiled languages.
If you want to mock out behavior in order to isolate specific units of code for testing, see the 'sinon' module http://sinonjs.org/. It allows you to dynamically swap in/out interceptors that can either spy on method calls or replace them altogether. In practice, you would write a mocha test where you require your module, then require a module that's leveraged in your code. Use sinon to spy or stub a method on that module and as a result, you can isolate your code.
There is one scenario where I've not been able to completely isolate 3rd party code with sinon, and this is when the act of require()ing a module executes some behavior that you don't want to run in your test. For that scenario, I made a super simple module called 'mockrequire' https://github.com/mateodelnorte/mockrequire that allows you to provide an inline mock to be required instead of the actual module. You can provide a mock that uses spy or stub from sinon and have the same syntax and patterns as all the rest of your tests.
Hopefully this answers the underlying question from your post. ;)
In very simple situations, you could simply export a function that modifies objects in your file scope and returns your actual exports object, but if you want to inject more variably (i.e. for more than one use from your app) it's generally better to create a wrapper object like you have done.
You can reduce some overhead and indentation in some situations by using a wrapper class instead of a function returning an object.
For instance
function findById(id, callback, collection) {
// ...
// callback(null, Entity(...));
};
function Wrapper(di) {
this.di = di;
}
module.exports = Wrapper; // or do 'new' usage in a function if preferred
Wrapper.prototype.findById = function (id, callback) {
// use this.di to call findById and getCollection
}, // etc
Other than that, it's not a whole lot you can do to improve things. I like this approach though. Keeps the state di explicit and separate from the function body of findById and by using a class you reduce the nesting of indentation a little bit at least.

Invalidating handles that point to a deleted C++ object

When a C++ object that is exposed to v8 is deleted, how can I invalidate handles that may pointed to this object.
I'm using v8 as a scripting interface to a larger application. Objects in the larger application are wrapped and accessed in v8 using node's ObjectWrap class.
The issue is, the lifetime of the wrapped objects is limited. If, in javascript, I execute something like:
var win = app.getWindow();
win.close(); // The C++ object that win references goes away
console.log(win.width()); // This should fail.
I want it to behave just like the comments say. After win.close() (or some other event maybe outside JS control), any access to win or duplicated handle needs to fail.
Currently, I have to mark the wrapped C++ object to be invalid and check the validity on every method call. Is this the only way to do it, or is there a way to mark a handle as no longer valid?
The only way that comes to mind would be to have an extra function around that give an error when called. Then when you call '.close', you could create properties on your win that would take precedence over the object prototype versions.
function closedError() {
return new Error("Window Closed");
}
win.close = function() {
this.width = closedError;
this.otherfunc = closedError;
};
I don't have a compiler handy at the moment, but I imagine something like this in C++.
static Handle<Value> Close(const Arguments& args) {
HandleScope scope;
NODE_SET_METHOD(args.This(), "width", Window::Width);
NODE_SET_METHOD(args.This(), "otherfunc", Window::OtherFunc);
}

Resources