Nodejs global variable not working in separate process

Nodejs global variable not working in separate process - node.js

I am using queues with bullJS library. In entry point, I have defined global.db variable which I can use everywhere.
On bull's documentation I read separate processes are better so I created a new separate process in a file and I'm doing
queue.process("path-to-the-file")
And in that file I can't use my global variable, it is undefined. Please suggest a solution or why is this happening?I am seeing if the file is included as module, it knows global variable but if it's referenced directly like I'm doing above, it doesn't know global variables.
const Queue = require("bull");
const queue = new Queue("update-inventory-queue");
const updateInventoryProcess = require("../processes/updateInventory");
queue.process(updateInventoryProcess);
The above snippet works but now the updateInventoryProcess is not separate process, it is just a function imported by the module.

As you've discovered, separate processes will, by their nature, not have the context of your main Node.js process.
A couple of solutions are to put that configuration in an include file that can be required in both the main process and in your job's node module, or provide it as part of the job data.
Not all things can be passed in job data for sandboxed workers, as Bull uses child_process.send to pass data back and forth, and it does some serialization and parsing, so be aware of that as well.

Related

Is it possible to free a required module in nodejs?

When you import a file in nodejs, it's loaded, evaluated and cached.
Is it possible to free the memory for that file, if you know you will never use it again (or maybe in a long time, worth it to compile it again).
What I want to do is importing a temporal file, read its code, execute it once and then free it forever (I know it's not going to be used again, and I don't want to have memory leaks)
Basically is having dynamic code in nodejs.
Pages like codility which allows you to input code and execute in backend side, should work with a similar solution... unless they run a complete new nodejs instance with that code... and then kill it.
Is it possible? If so, how?

You can delete from the module cache like this. Just make sure that there are no circular dependencies or the module will not actually be freed from memory
delete require.cache[require.resolve('./theModuleYouWantToDelete.js')]

It depends what you mean by "free" the module. Nodejs does not have a way to remove the code once it has been run so that will always remain in memory.
If you remove all references to the module (by deleting it from the cache) and removing any other references there might be to exported data, then any data associated with the module should be eligible for garbage collection.
For a service that lets the user run arbitrary code on the server, I would always run that in a sandboxed separate process where you can kill the process and recover all resources used by the code, not run that in the main server process.

Ok, reading the NodeJS documentation about modules, it happens to exist a public cache member and it says:
Modules are cached in this object when they are required. By deleting a key value from this object, the next require will reload the module. This does not apply to native addons, for which reloading will result in an error.
Adding or replacing entries is also possible. This cache is checked before native modules and if a name matching a native module is added to the cache, no require call is going to receive the native module anymore. Use with care!
So I guess every evaluated module lives here internally, and removing a key from this object like the docs says, it will also free the related memory to that portion of code (when the garbage collector do its job)

Is reading and writing process.env values synchronous?

Reading and writing environment variables in Node.js is done using the process.env object.
For instance:
process.env.foo evaluates to the env var foo
process.env.bar = 'blah' sets the value of the env var bar to blah
delete process.env.baz deletes the environment variable baz
From trial and error, and the lack of a callback, I assume that these actions are synchronous, but I found no reference to it process.env documentation.
Is env var access synchronous or asynchronous in Node.js?
Addendum: Why I believe this question to be non-trivial
Following the comments: Reading and writing the environment variables might mean that the process needs to communicate with the operating system, or perform some sort of blocking I/O operations.
Therefore, it makes sense to ask whether the environment variables are stored as a local object in memory without any synchronization, or otherwise sent to the operating system in a blocking manner.
Moreover, implementation may vary between operating systems and the official documentation lacks any promise to a non-blocking operation.

I think the "synchronous"/"asynchronous" may be a bit misleading.
I guess the actual question is: Is reading from or writing to process.env expensive? Does it perform a blocking operation with the operating system?
The short answer is Yes, it can be expensive.
For more background info and how much it can impact some apps, see this GitHub issue. There it was already stated that the documentation should be updated to make it clear that accessing process.env is slow in 2015 but it hasn't happened yet.
You can actually see the implementation for process.env in the node.js source code where it's obvious that any access will call one of the functions defined from here onwards.
Note: At the time of writing, this was defined in node.cc in a more straight-forward way. The links above still point to the old implementation. Newer versions of node have process.env implemented in a separate file node_env_var.cc which can be found here, but it has more encapsulation, making it harder to follow for the purpose of this explanation.
Depending on the platform, this may have more or less of an impact.
It becomes most obvious on Windows, because there you can view a process' current environment from the outside (while in Linux, the /proc/.../environ file will retain its original contents when the environment was changed with setenv).
For example:
node -e "process.env.TEST = '123'; setInterval(() => {}, 1000);";
This will start a node process which creates a TEST environment variable in the current process' environment and then wait forever.
Now we can open a tool like Process Explorer or Process Hacker and look at the environment of the node process:
And lo and behold, the variable is there. This proves in another way that writing to process.env does in fact access the operating system.
Also, because the object actually queries all data from the OS, it means that it even behaves different than a normal object. Again, Windows example (because it's most quirky):
Windows matches environment variables case-insensitive.
> process.env.TEST = '123'
'123'
> process.env.tEsT
'123'
Windows has hidden environment variables starting with = which cannot be changed through normal means and which are not enumerated. node.js replicates these semantics. The =X: variables in particular represent the current directory in specific drives (yes, Windows stores them per drive).
> Object.keys(process.env).filter(k => k === '=Z:')
[]
> process.env['=Z:']
'Z:\\'
> process.env['=Z:'] = 'Z:\\Temp'
'Z:\\Temp'
> process.env['=Z:']
'Z:\\'
> process.chdir('Z:\\Temp')
undefined
> process.env['=Z:']
'Z:\\Temp'
Now, somebody might think (similar to what was proposed in the GitHub issue that I linked) that node.js should just cache process.env in an actual object, and for child process creation read the environment from the cached object. This is not advisible for the following reasons:
They would need to copy the semantics of the underlying platform and reimplement them. As you can see in the above example for Windows, this would at some point end up in intercepting chdir and trying to automatically update the relevant =X: variable of the affected drive (and then it wouldn't work if a native plugin would change the current directory), or access the OS only for some variables, and therein lies madness and huge potentional for obscure bugs.
This would break applications which read a process' environment from the outside (like Process Explorer), as they would see incorrect values.
This would create inconsistencies if a native module would access the environment variables in its own from C++ code, because they would now have a different state than the cached object.
This would cause childprocesses to not inherit the correct variables if the child process were started by a native module (for the same reason as above).
This should also explain why it is a bad idea to do process.env = JSON.parse(JSON.stringify(process.env)) in your code. For one, it would break case-insensitivity on Windows (and you can't possibly know what modules which some other module requires may depend on that), and apart from that it would of course cause tons of other problems as described above.

Actually it is an normal object to make you could get the environment variables of current process, after all them are just some variables for carry some setting to a program. Nodejs just set a normal object for them after nodejs program read them. Although documentation not write them but it write this is an object and following things:
It is possible to modify this object, but such modifications will not
be reflected outside the Node.js process. In other words, the
following example would not work:
$ node -e 'process.env.foo = "bar"' && echo $foo
While the following will:
process.env.foo = 'bar';
console.log(process.env.foo);
Assigning a property on process.env will implicitly convert the value to a string.
This is enough to explain your problem.

Share data between node child processes

I have a small node script that gets ran in several node child processes, and I do not have access to the parent process.
The goal of the script is pretty simple, to return one element at random from an array. However, the element returned cannot be used by any of the other child processes.
The only solutions I can think of involve using redis or a database, and because this is such a tiny script, I would like to avoid that.
Here is an example of what I would like my code to look like:
var accounts = [acc1, acc2, acc3]
function() {
var usedAccounts = sharedStore.get('usedAccounts')
var unusedAccounts = filter(accounts, usedAccounts)
var account = getRandomAccount(unusedAccounts)
usedAccounts.push(account)
sharedStore.set('usedAccounts', usedAccounts)
return account
}
So far, the solutions I've thought of don't work because the sibling processes initially all get an empty list assigned to usedAccounts.

There are two problems you need to solve:
How to share data between multiple node processes without using the parent process to marshal data between them.
How to ensure that data is consistent across all the shared processes.
How to share data between multiple node processes.
Given your constraints with not wanting to use an external service (like Redis or another database service), and that nodejs doesn't have an easy way to use something like shared memory, a possible solution is to use a shared file between all the processes. Each process can read and write to a shared file, and use that that to get it's userAccount data.
The file could be JSON formatted and look something like this:
[
{
"accountName":"bob",
"accountUsed":false
},
{
"accountName":"alice",
"accountUsed":true
}
]
This would just be an array of userAccount objects, that also have a flag that indicate if the data is being read.
You app would:
GetAccountData():
Open the file
Read the file into memory
Iterate over the array
Find the first userAccount that is available
Set the accountUsed flag to true
Write the updated array back to the file
Close the file.
With having multiple processes reading and writing to a single resource is a well understood problem with concurrency called the Readers-Writers Problem.
How to ensure that data is consistent across all the shared processes.
To ensure data is consistent, you need to ensure that only one process can run the algorithm from above from start to finish at a time.
Operating Systems may provide exclusive locking of a file, but I nodejs has no native support for that.
A common mechanism would be to use a lockfile, and use it's existence to guard access to the datafile above. If it can't acquire a lock, it should wait a period of time, then attempt to reacquire the lock.
To acquire the lock:
Check if the lockfile exists.
If lockfile exists
Set a timer (setInterval) to acquire the lock
If the lockfile doesn't exist
Create the lockfile
If the lockfile creation fails (because it exists--race condition with another process)
Set a timer (setInterval) to acquire the lock
If the lockfile creation succeeds
Do GetAccountData();
Remove lockfile
This solution should work, but it's not without kludge. Using a synchronizing primative like a lock can cause your application to deadlock. Also using a timer to periodically acquire the lock is wasteful and can cause a race condition if not properly checking lock creation.
If your app crashes before it removes the lockfile, then you may create a deadlock situation. To guard against that, you might want to put a final unhandled exception handler to remove the lockfile if it was created by the process.
You will need to also make sure you only hold the lock long enough to do your serial work. Holding the lock for longer, will cause performance issues, and increase the likelihood of a deadlock.

I rather let each process have its own flat file that they can write. And each process will be able to read all the files written by all the processes concurrently or otherwise thus obviating need of lock-file.
Though you will have to figure out the logic as to how each process will write only its own file but reading all these files together brings out the source of truth

node.js express custom format debug logging

A seemingly simple question, but I am unsure of the node.js equivalent to what I'm used to (say from Python, or LAMP), and I actually think there may not be one.
Problem statement: I want to use basic, simple logging in my express app. Maybe I want to output DEBUG messages, or INFO messages, or just some stats to the log for consumption by other back-end systems later.
1) I want all logs message, however, to contain some fields: remote-ip and request url, for example.
2) On the other hand, code that logs is everywhere in my app, including deep inside the call tree.
3) I don't want to pass (req,res) down into every node in the call tree (this just creates a lot of parameter passing where they are mostly not needed, and complicates my code, as I need to pass these into async callbacks and timeouts etc.)
In other systems, where there is a thread per request, I will store the (req,res) pair (where all the data I need is) in a thread-local-storage, and the logger will read this and format the message.
In node, there is only one thread. What is my alternative here? What's "the request context in which a specific piece of code is running under"?
The only way I can think of achieving something like this is by looking at a trace, and using reflection to look at local variables up the call tree. I hate that, plus would need to implement this for all callbacks, setTimeouts, setIntervals, new Function()'s, eval's, ... and the list goes on.
What are other people doing?

File read access from threads

I have a static class that contains a number of functions that read values from configuration files. The configuration files are provided with the software and the software itself NEVER writes to them.
I have a number of threads that are running in my application and I need to call a function in the static class. The function will then go to one of the configuration files, look up a value (depending on the parameter that I pass when I call the function) and then return a result.
I need the threads to be able to read the file all at the same time (or rather, without synchronising to the main thread). The threads will NEVER write to the configuration files.
My question is simply, therefore, will there be any issues in allowing multiple threads to call the same static functions to read values from the same file at the same time? I can appreciate that there would be serialization issues if some threads were writing to the file while others were reading, but this will never happen.
Basically:
1. Are there any issues allowing multiple threads to read from the same file at the same time?
2. Are there any issues allowing multiple threads to call the same static functions (in the same static class) at the same time?

Yes, this CAN be an issue, depending on how the class is actually locating and reading from the files, and more so if the class is also caching the values so it does not need to read from the files every time. Without seeing your class's actual code, there is no way to tell you whether your code is thread-safe or not.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string