NodeJS - can I "free" a required module? - node.js

As I may know, a required module is cached and be there all the time no matter how many times require(...) is called.
function f1() {
const m = require("/path/to/my/module")
...
}
In my situation, my module keeps having code updated, so I want to load and free every time I use in f1().
How can I do that? If I don't use require, what should I use?
Thanks

Related

Is it possible to use Node worker threads to perform database inserts?

I recently read about Node's "worker_threads" module that allows parallel execution of Javascript code in multiple threads which is useful for CPU-intensive operations. (NOTE: these are not web workers made by Chrome in the browser)
I'm building a feature where I need to do a massive amount of Postgres INSERTs without blocking the browser.
The problem is: in my Javascript files where I instantiate the worker, I'm not allowed to import anything, including native Node modules or NPM libraries like Knex.js which is necessary to do database queries. I get an error that says: Cannot use import statement outside a module as soon as the file is executed.
I've tried putting the worker code in another file with an import statement at the top (same error). I've tried giving the Knex object to workerData but it cannot clone a non-native JS object.
I'm out of ideas- does anyone know how to interact with a database in a worker thread if we can't import any NPM libraries?!?!
// mainThread.js
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
import knex from 'knex'; // --> *** UNCAUGHT EXCEPTION: Cannot use import statement outside a module ***
if (isMainThread) {
module.exports = async function runWorker (rowsToInsert = []) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, { workerData: { rowsToInsert } });
worker.on('message', (returningRows) => resolve(returningRows));
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`));
});
});
};
} else {
const { rowsToInsert } = workerData;
return knex('table').insert(rowsToInsert)
.then((returningRows) => {
parentPort.postMessage({ data: returningRows });
});
}
I am following a tutorial from this webpage: https://blog.logrocket.com/use-cases-for-node-workers/
It is of course possible, but it's a very bad idea.
Database drivers are already asynchronous and non-blocking of the JavaScript thread. Moving your insert calls to a separate thread as you propose will not only get you no performance gains, it will actually decrease overall performance because of the overhead involved with interthread communication:
Synchronization and message passing is not free
JavaScript uses structured cloning when moving data between threads. This means all your rowsToInsert must be copied, which is (relatively) expensive.
Generally, the only time it's really appropriate to use JS threads is when your JavaScript code is performing CPU-intensive work. The node docs say as much right at the top:
Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.
This means if you're doing a lot of parsing, math, or similar, it may be appropriate to do the work in a thread. However, simply shoveling data from one place to another (ie, I/O) is not a good candidate for a thread — after all, node's design is tuned to be efficient at this kind of work.
You don't say where your rowsToInsert come from, but if it's coming in from HTTP request(s), a thread is the wrong thing to use. However, if you're parsing eg a CSV or JSON file on the server, it may be worthwhile to do that in a thread, but it's important that the thread does all the work (so memory need not be moved between threads). The message you post to the worker should just be "process the file located at /foo/bar.csv", and then the worker thread does the rest.
The error you're getting is the same that you'd get without worker threads: you're trying to use import in a regular, non-module JS file. Either rename the worker file to *.mjs or use require('knex') instead.
node's ES module documentation goes into detail about how import differs from require.
Cannot use import statement outside a module
This is just a complaint that your code is using the "import" style of imports, but "import" is only supported inside ES modules and your code is in a CommonJS context. In CommonJS code you need to use require() instead:
const knex = require('knex');

AWS Lambda does not run independently

I am using the nodejs to use AWS Lambda.
As I know each function of lambda is handled in independent and parallel process.
However, following example shows different result than I expected.
// test.js
const now = new Date();
module.exports = () => {
console.log(now);
};
// handler.js
const test = require('./test');
module.exports.hello = async (event, context) => {
test();
return {
statusCode: 200,
body: null
};
};
RESULT:
hello handler log
As I intended, each function was executed independently, so the value of console.log(now) should always be the point at which it was executed.
However, in the actual log, the value of now is continuously recorded at the point of the very first execution - rather than each function’s execution.
The log’s value after 5 minutes was the same.
However, the value changed after 12 hours, but after that, it shows the same problem.
This result gives us serious consideration of how to manage the DB connection.
There are two assumption for each case of lambda’s recycling
If lambda recycles like test.js,
better to use connection pool
also recommends to use a orm such as sequelize which requires initialization
If not,
better to use simple connections and regular queries to quickly consume connections
How can we use lambda within maximum performance?
How can we interpret the test results above?
AWS Lambda creates and reuses the containers, so you need to understand the impact of this practice on the programming model.
The first time a function executes, a new container will be created to execute it.
Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again. However, if you haven’t changed the Lambda function code and not too much time has gone by, Lambda may reuse the previous container. This offers performance advantages: Lambda gets to skip the nodejs language initialization, and you get to skip initialization in your code (so you can reuse DB connections, for example); files that you wrote to /tmp last time around will still be there if the container gets reused; anything you initialized globally outside of the Lambda function handler persists.
For more see Understanding Container Reuse in AWS Lambda.
The behavior that you have described is a result of AWS optimizations. It looks like your lambda is very fast and it is more efficient to use only one unit of execution (process/container/instance) fro AWS. So try to simulate a long running process and see that the actual timestamps are different in this case.

node, require, singleton or not singleton?

I was pretty shocked to find out that "require" in node creates a singleton by default. One might assume that many people have modules which they require which have state, but are created as a singleton, so break the app as soon as there are multiple concurrent users.
We have the opposite problem, requires is creating a non-singleton, and we dont know how to fix this.
Because my brain is wired as a java developer, all our node files/modules are defined thusly:
file playerService.js
const Player = require("./player")
class PlayerService {
constructor(timeout) {
// some stuff
}
updatePlayer(player) {
// logic to lookup player in local array and change it for dev version.
// test version would lookup player in DB and update it.
}
}
module.exports = PlayerService
When we want to use it, we do this:
someHandler.js
const PlayerService = require("./playerService")
const SomeService = require("./someService")
playerService = new PlayerService(3000)
// some code which gets a player
playerService.updatePlayer(somePlayer)
Although requires() creates singletons by default, in the above case, I am guessing it is not creating a singleton as each websocket message (in our case) will instantiate a new objects in every module which is called in the stack. That is a lot of overhead - to service a single message, the service might get instantiated 5 times as there are 5 different sub services/helper classes which call each other and all do a requires(), and then multiply this by the number of concurrent users and you get a lot of unnecessary object creation.
1) How do we modify the above class to work as a singleton, as services don't have state?
2) Is there any concept of a global import or creating a global object, such that we can import (aka require) and/or instantiate an object once for a particular websocket connection and/or for all connections? We have no index.js or similar. It seems crazy to have to re-require the dependent modules/files for every js file in a stack. Note, we looked at DI options, but found them too arcane to comprehend how to use them as we are not js gurus, despite years of trying.
You can simply create an instance inside the file and export it.
let playerService = new PlayerService();
module.exports = playerService;
In this case, you may want to add setters for the member variables you would take as constructor parameters to ensure encapsulation.
Also note that, creating object instances with new in javascript is cheaper than traditional OOP language because of it's prototype model (more).
So don't hesitate when you really need new instances (as seen in your code, do you really want to share the timeout constructor parameter?), since javascript objects are pretty memory efficient with prototype methods and modern engines has excellent garbage collectors to prevent memory leak.

CommonJS is synchronous but what would happen if there was a call to an async function

CommonJS uses a require() statement that is synchronous but what if you have a module like so:
function asyncFunction() {
var promise = ...;
return promise;
}
module.exports = asyncFunction();
what kind of problems could arise here? are you supposed to always have synchronous code returned for the module.exports object? For example if module.exports = {} it will always be synchronous, but in the above case module.exports is a promise, which is suppose is something not considered synchronous. Is there ever a good reason to use requireJS on the server side if you need to import a module that is async by nature?
what kind of problems could arise here?
This violates CommonJS conventions and will be surprising to developers. It shows you don't distinguish between code and data. Code in node on the server can and should be loaded synchronously. Data can and should use promises, callbacks, etc.
Is there ever a good reason to use requireJS on the server side
Not that I've ever seen. Speaking personally, requireJS is terrible and if your module introduced it, there is absolutely no chance I would use it in my node project.
if you need to import a module that is async by nature?
You'll need to provide specifics. I've never seen a node module that is "async by nature", at least not by someone who understands the difference between code and data and realizes that dynamically loading remote code into a running node.js server application is something most deployments want to avoid for good reasons including reliability and security.

Lazy loading in node.js

I was wondering if using require() in node.js was the equivalent to lazy loading?
For example if I had a function that required a specific node.js package that wasn't needed anywhere else in my code am I best to use require() inside of that function to include the needed package only when that function is called.
I'm also unsure if this will provide any performance improvements given my lack of understanding around the node.js architecture? I presume it will use less memory per connection to my server. However will it increase I/O to the disk when it has to read the package, or will this be a one off to get it in memory?
If this is the case how far should I take this, should I be trying to write node.js packages for as much code as I can?
require() is on-demand loading. Once a module has been loaded it won't be reloaded if the require() call is run again. By putting it inside a function instead of your top level module code, you can delay its loading or potentially avoid it if you never actually invoke that function. However, require() is synchronous and loads the module from disk so best practice is to load any modules you need at application start before your application starts serving requests which then ensures that only asynchronous IO happens while your application is operational.
Node is single threaded so the memory footprint of loading a module is not per-connection, it's per-process. Loading a module is a one-off to get it into memory.
Just stick with the convention here and require the modules you need at the top level scope of your app before you start processing requests. I think this is a case of, if you have to ask whether you need to write your code in an unusual way, you don't need to write your code in an unusual way.
If you want to lazy load modules, its now possible with ES6 (Node v6)
Edit: This will not work if you need to access properties of require (like
require.cache).
module.js
console.log('Module was loaded')
exports.d=3
main.js
var _require = require;
var require = function (moduleName) {
var module;
return new Proxy(function () {
if (!module) {
module = _require(moduleName)
}
return module.apply(this, arguments)
}, {
get: function (target, name) {
if (!module) {
module = _require(moduleName)
}
return module[name];
}
})
};
console.log('Before require');
var a = require('./module')
console.log('After require');
console.log(a.d)
console.log('After log module');
output
Before require
After require
Module was loaded
3
After log module

Resources