I was wondering if using require() in node.js was the equivalent to lazy loading?
For example if I had a function that required a specific node.js package that wasn't needed anywhere else in my code am I best to use require() inside of that function to include the needed package only when that function is called.
I'm also unsure if this will provide any performance improvements given my lack of understanding around the node.js architecture? I presume it will use less memory per connection to my server. However will it increase I/O to the disk when it has to read the package, or will this be a one off to get it in memory?
If this is the case how far should I take this, should I be trying to write node.js packages for as much code as I can?
require() is on-demand loading. Once a module has been loaded it won't be reloaded if the require() call is run again. By putting it inside a function instead of your top level module code, you can delay its loading or potentially avoid it if you never actually invoke that function. However, require() is synchronous and loads the module from disk so best practice is to load any modules you need at application start before your application starts serving requests which then ensures that only asynchronous IO happens while your application is operational.
Node is single threaded so the memory footprint of loading a module is not per-connection, it's per-process. Loading a module is a one-off to get it into memory.
Just stick with the convention here and require the modules you need at the top level scope of your app before you start processing requests. I think this is a case of, if you have to ask whether you need to write your code in an unusual way, you don't need to write your code in an unusual way.
If you want to lazy load modules, its now possible with ES6 (Node v6)
Edit: This will not work if you need to access properties of require (like
require.cache).
module.js
console.log('Module was loaded')
exports.d=3
main.js
var _require = require;
var require = function (moduleName) {
var module;
return new Proxy(function () {
if (!module) {
module = _require(moduleName)
}
return module.apply(this, arguments)
}, {
get: function (target, name) {
if (!module) {
module = _require(moduleName)
}
return module[name];
}
})
};
console.log('Before require');
var a = require('./module')
console.log('After require');
console.log(a.d)
console.log('After log module');
output
Before require
After require
Module was loaded
3
After log module
Related
I recently read about Node's "worker_threads" module that allows parallel execution of Javascript code in multiple threads which is useful for CPU-intensive operations. (NOTE: these are not web workers made by Chrome in the browser)
I'm building a feature where I need to do a massive amount of Postgres INSERTs without blocking the browser.
The problem is: in my Javascript files where I instantiate the worker, I'm not allowed to import anything, including native Node modules or NPM libraries like Knex.js which is necessary to do database queries. I get an error that says: Cannot use import statement outside a module as soon as the file is executed.
I've tried putting the worker code in another file with an import statement at the top (same error). I've tried giving the Knex object to workerData but it cannot clone a non-native JS object.
I'm out of ideas- does anyone know how to interact with a database in a worker thread if we can't import any NPM libraries?!?!
// mainThread.js
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
import knex from 'knex'; // --> *** UNCAUGHT EXCEPTION: Cannot use import statement outside a module ***
if (isMainThread) {
module.exports = async function runWorker (rowsToInsert = []) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, { workerData: { rowsToInsert } });
worker.on('message', (returningRows) => resolve(returningRows));
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`));
});
});
};
} else {
const { rowsToInsert } = workerData;
return knex('table').insert(rowsToInsert)
.then((returningRows) => {
parentPort.postMessage({ data: returningRows });
});
}
I am following a tutorial from this webpage: https://blog.logrocket.com/use-cases-for-node-workers/
It is of course possible, but it's a very bad idea.
Database drivers are already asynchronous and non-blocking of the JavaScript thread. Moving your insert calls to a separate thread as you propose will not only get you no performance gains, it will actually decrease overall performance because of the overhead involved with interthread communication:
Synchronization and message passing is not free
JavaScript uses structured cloning when moving data between threads. This means all your rowsToInsert must be copied, which is (relatively) expensive.
Generally, the only time it's really appropriate to use JS threads is when your JavaScript code is performing CPU-intensive work. The node docs say as much right at the top:
Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.
This means if you're doing a lot of parsing, math, or similar, it may be appropriate to do the work in a thread. However, simply shoveling data from one place to another (ie, I/O) is not a good candidate for a thread — after all, node's design is tuned to be efficient at this kind of work.
You don't say where your rowsToInsert come from, but if it's coming in from HTTP request(s), a thread is the wrong thing to use. However, if you're parsing eg a CSV or JSON file on the server, it may be worthwhile to do that in a thread, but it's important that the thread does all the work (so memory need not be moved between threads). The message you post to the worker should just be "process the file located at /foo/bar.csv", and then the worker thread does the rest.
The error you're getting is the same that you'd get without worker threads: you're trying to use import in a regular, non-module JS file. Either rename the worker file to *.mjs or use require('knex') instead.
node's ES module documentation goes into detail about how import differs from require.
Cannot use import statement outside a module
This is just a complaint that your code is using the "import" style of imports, but "import" is only supported inside ES modules and your code is in a CommonJS context. In CommonJS code you need to use require() instead:
const knex = require('knex');
As I may know, a required module is cached and be there all the time no matter how many times require(...) is called.
function f1() {
const m = require("/path/to/my/module")
...
}
In my situation, my module keeps having code updated, so I want to load and free every time I use in f1().
How can I do that? If I don't use require, what should I use?
Thanks
This code works because system-sleep blocks execution of the main thread but does not block callbacks. However, I am concerned that system-sleep is not 100% portable because it relies on the deasync npm module which relies on C++.
Are there any alternatives to system-sleep?
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100) // without this line the while loop causes problems because it is a spin wait
console.log('sleeping')
}
console.log('If this is displayed then it works!')
PS Ideally, I want a solution that works on Node 4+ but anything is better than nothing.
PPS I know that sleeping is not best practice but I don't care. I'm tired of arguments against sleeping.
Collecting my comments into an answer per your request:
Well, deasync (which sleep() depends on) uses quite a hack. It is a native code node.js add-on that manually runs the event loop from C++ code in order to do what it is doing. Only someone who really knows the internals of node.js (now and in the future) could imagine what the issues are with doing that. What you are asking for is not possible in regular Javascript code without hacking the node.js native code because it's simply counter to the way Javascript was designed to run in node.js.
Understood and thanks. I am trying to write a more reliable deasync (which fails on some platforms) module that doesn't use a hack. Obviously this approach I've given is not the answer. I want it to support Node 4. I'm thinking of using yield / async combined with babel now but I'm not sure that's what I'm after either. I need something that will wait until the callback is callback is resolved and then return the value from the async callback.
All Babel does with async/await is write regular promise.then() code for you. async/await are syntax conveniences. They don't really do anything that you can't write yourself using promises, .then(), .catch() and in some cases Promise.all(). So, yes, if you want to write async/await style code for node 4, then you can use Babel to transpile your code to something that will run on node 4. You can look at the transpiled Babel code when using async/await and you will just find regular promise.then() code.
There is no deasync solution that isn't a hack of the engine because the engine was not designed to do what deasync does.
Javascript in node.js was designed to run one Javascript event at a time and that code runs until it returns control back to the system where the system will then pull the next event from the event queue and run its callback. Your Javascript is single threaded with no pre-emptive interruptions by design.
Without some sort of hack of the JS engine, you can't suspend or sleep one piece of Javascript and then run other events. It simply wasn't designed to do that.
var one = 0;
function delay(){
return new Promise((resolve, reject) => {
setTimeout(function(){
resolve('resolved')
}, 2000);
})
}
while (one == 0) {
one = 1;
async function f1(){
var x = await delay();
if(x == 'resolved'){
x = '';
one = 0;
console.log('resolved');
//all other handlers go here...
//all of the program that you want to be affected by sleep()
f1();
}
}
f1();
}
I was pretty shocked to find out that "require" in node creates a singleton by default. One might assume that many people have modules which they require which have state, but are created as a singleton, so break the app as soon as there are multiple concurrent users.
We have the opposite problem, requires is creating a non-singleton, and we dont know how to fix this.
Because my brain is wired as a java developer, all our node files/modules are defined thusly:
file playerService.js
const Player = require("./player")
class PlayerService {
constructor(timeout) {
// some stuff
}
updatePlayer(player) {
// logic to lookup player in local array and change it for dev version.
// test version would lookup player in DB and update it.
}
}
module.exports = PlayerService
When we want to use it, we do this:
someHandler.js
const PlayerService = require("./playerService")
const SomeService = require("./someService")
playerService = new PlayerService(3000)
// some code which gets a player
playerService.updatePlayer(somePlayer)
Although requires() creates singletons by default, in the above case, I am guessing it is not creating a singleton as each websocket message (in our case) will instantiate a new objects in every module which is called in the stack. That is a lot of overhead - to service a single message, the service might get instantiated 5 times as there are 5 different sub services/helper classes which call each other and all do a requires(), and then multiply this by the number of concurrent users and you get a lot of unnecessary object creation.
1) How do we modify the above class to work as a singleton, as services don't have state?
2) Is there any concept of a global import or creating a global object, such that we can import (aka require) and/or instantiate an object once for a particular websocket connection and/or for all connections? We have no index.js or similar. It seems crazy to have to re-require the dependent modules/files for every js file in a stack. Note, we looked at DI options, but found them too arcane to comprehend how to use them as we are not js gurus, despite years of trying.
You can simply create an instance inside the file and export it.
let playerService = new PlayerService();
module.exports = playerService;
In this case, you may want to add setters for the member variables you would take as constructor parameters to ensure encapsulation.
Also note that, creating object instances with new in javascript is cheaper than traditional OOP language because of it's prototype model (more).
So don't hesitate when you really need new instances (as seen in your code, do you really want to share the timeout constructor parameter?), since javascript objects are pretty memory efficient with prototype methods and modern engines has excellent garbage collectors to prevent memory leak.
CommonJS uses a require() statement that is synchronous but what if you have a module like so:
function asyncFunction() {
var promise = ...;
return promise;
}
module.exports = asyncFunction();
what kind of problems could arise here? are you supposed to always have synchronous code returned for the module.exports object? For example if module.exports = {} it will always be synchronous, but in the above case module.exports is a promise, which is suppose is something not considered synchronous. Is there ever a good reason to use requireJS on the server side if you need to import a module that is async by nature?
what kind of problems could arise here?
This violates CommonJS conventions and will be surprising to developers. It shows you don't distinguish between code and data. Code in node on the server can and should be loaded synchronously. Data can and should use promises, callbacks, etc.
Is there ever a good reason to use requireJS on the server side
Not that I've ever seen. Speaking personally, requireJS is terrible and if your module introduced it, there is absolutely no chance I would use it in my node project.
if you need to import a module that is async by nature?
You'll need to provide specifics. I've never seen a node module that is "async by nature", at least not by someone who understands the difference between code and data and realizes that dynamically loading remote code into a running node.js server application is something most deployments want to avoid for good reasons including reliability and security.