Is reading and writing process.env values synchronous? - node.js

Reading and writing environment variables in Node.js is done using the process.env object.
For instance:
process.env.foo evaluates to the env var foo
process.env.bar = 'blah' sets the value of the env var bar to blah
delete process.env.baz deletes the environment variable baz
From trial and error, and the lack of a callback, I assume that these actions are synchronous, but I found no reference to it process.env documentation.
Is env var access synchronous or asynchronous in Node.js?
Addendum: Why I believe this question to be non-trivial
Following the comments: Reading and writing the environment variables might mean that the process needs to communicate with the operating system, or perform some sort of blocking I/O operations.
Therefore, it makes sense to ask whether the environment variables are stored as a local object in memory without any synchronization, or otherwise sent to the operating system in a blocking manner.
Moreover, implementation may vary between operating systems and the official documentation lacks any promise to a non-blocking operation.

I think the "synchronous"/"asynchronous" may be a bit misleading.
I guess the actual question is: Is reading from or writing to process.env expensive? Does it perform a blocking operation with the operating system?
The short answer is Yes, it can be expensive.
For more background info and how much it can impact some apps, see this GitHub issue. There it was already stated that the documentation should be updated to make it clear that accessing process.env is slow in 2015 but it hasn't happened yet.
You can actually see the implementation for process.env in the node.js source code where it's obvious that any access will call one of the functions defined from here onwards.
Note: At the time of writing, this was defined in node.cc in a more straight-forward way. The links above still point to the old implementation. Newer versions of node have process.env implemented in a separate file node_env_var.cc which can be found here, but it has more encapsulation, making it harder to follow for the purpose of this explanation.
Depending on the platform, this may have more or less of an impact.
It becomes most obvious on Windows, because there you can view a process' current environment from the outside (while in Linux, the /proc/.../environ file will retain its original contents when the environment was changed with setenv).
For example:
node -e "process.env.TEST = '123'; setInterval(() => {}, 1000);";
This will start a node process which creates a TEST environment variable in the current process' environment and then wait forever.
Now we can open a tool like Process Explorer or Process Hacker and look at the environment of the node process:
And lo and behold, the variable is there. This proves in another way that writing to process.env does in fact access the operating system.
Also, because the object actually queries all data from the OS, it means that it even behaves different than a normal object. Again, Windows example (because it's most quirky):
Windows matches environment variables case-insensitive.
> process.env.TEST = '123'
'123'
> process.env.tEsT
'123'
Windows has hidden environment variables starting with = which cannot be changed through normal means and which are not enumerated. node.js replicates these semantics. The =X: variables in particular represent the current directory in specific drives (yes, Windows stores them per drive).
> Object.keys(process.env).filter(k => k === '=Z:')
[]
> process.env['=Z:']
'Z:\\'
> process.env['=Z:'] = 'Z:\\Temp'
'Z:\\Temp'
> process.env['=Z:']
'Z:\\'
> process.chdir('Z:\\Temp')
undefined
> process.env['=Z:']
'Z:\\Temp'
Now, somebody might think (similar to what was proposed in the GitHub issue that I linked) that node.js should just cache process.env in an actual object, and for child process creation read the environment from the cached object. This is not advisible for the following reasons:
They would need to copy the semantics of the underlying platform and reimplement them. As you can see in the above example for Windows, this would at some point end up in intercepting chdir and trying to automatically update the relevant =X: variable of the affected drive (and then it wouldn't work if a native plugin would change the current directory), or access the OS only for some variables, and therein lies madness and huge potentional for obscure bugs.
This would break applications which read a process' environment from the outside (like Process Explorer), as they would see incorrect values.
This would create inconsistencies if a native module would access the environment variables in its own from C++ code, because they would now have a different state than the cached object.
This would cause childprocesses to not inherit the correct variables if the child process were started by a native module (for the same reason as above).
This should also explain why it is a bad idea to do process.env = JSON.parse(JSON.stringify(process.env)) in your code. For one, it would break case-insensitivity on Windows (and you can't possibly know what modules which some other module requires may depend on that), and apart from that it would of course cause tons of other problems as described above.

Actually it is an normal object to make you could get the environment variables of current process, after all them are just some variables for carry some setting to a program. Nodejs just set a normal object for them after nodejs program read them. Although documentation not write them but it write this is an object and following things:
It is possible to modify this object, but such modifications will not
be reflected outside the Node.js process. In other words, the
following example would not work:
$ node -e 'process.env.foo = "bar"' && echo $foo
While the following will:
process.env.foo = 'bar';
console.log(process.env.foo);
Assigning a property on process.env will implicitly convert the value to a string.
This is enough to explain your problem.

Related

Libgit2 global state and thread safety

I'm trying to revise our codebase which seems to be using libgit2 wrong (at least TSAN is going crazy over how we use it).
I understand that most operations are object based (aka, operations on top of repo are localized to that repo), but I'm unclear when it comes to the global state and which operations need to be synchronized globally.
Is there a list of functions that require global synchronization?
Also when it comes to git_repository_open(), do I need to ensure that one path is only ever held by a single thread? I.e. do I need to prevent multiple threads accessing the same repo?

Is it possible to free a required module in nodejs?

When you import a file in nodejs, it's loaded, evaluated and cached.
Is it possible to free the memory for that file, if you know you will never use it again (or maybe in a long time, worth it to compile it again).
What I want to do is importing a temporal file, read its code, execute it once and then free it forever (I know it's not going to be used again, and I don't want to have memory leaks)
Basically is having dynamic code in nodejs.
Pages like codility which allows you to input code and execute in backend side, should work with a similar solution... unless they run a complete new nodejs instance with that code... and then kill it.
Is it possible? If so, how?
You can delete from the module cache like this. Just make sure that there are no circular dependencies or the module will not actually be freed from memory
delete require.cache[require.resolve('./theModuleYouWantToDelete.js')]
It depends what you mean by "free" the module. Nodejs does not have a way to remove the code once it has been run so that will always remain in memory.
If you remove all references to the module (by deleting it from the cache) and removing any other references there might be to exported data, then any data associated with the module should be eligible for garbage collection.
For a service that lets the user run arbitrary code on the server, I would always run that in a sandboxed separate process where you can kill the process and recover all resources used by the code, not run that in the main server process.
Ok, reading the NodeJS documentation about modules, it happens to exist a public cache member and it says:
Modules are cached in this object when they are required. By deleting a key value from this object, the next require will reload the module. This does not apply to native addons, for which reloading will result in an error.
Adding or replacing entries is also possible. This cache is checked before native modules and if a name matching a native module is added to the cache, no require call is going to receive the native module anymore. Use with care!
So I guess every evaluated module lives here internally, and removing a key from this object like the docs says, it will also free the related memory to that portion of code (when the garbage collector do its job)

safely executing arbitrary code

I have a program that can get code from a user as input (This question is language-agnostic, though I am primarily interested in answers for Java and Python). Usually, this code is going to be useful, but I don't have a guarantee that the user isn't making a mistake, or even deliberately giving malicious code.
I want to be able to execute this code safely, i.e. without harmful side effects if it turns out to be faulty or malicious.
More specifically:
the user specifies that the input code should operate on some objects that exist in the primary program (the program that gets the code from the user and executes it). Optimally, it should be able to access these objects directly, but sending them over to the child program through some communication protocol or a file is also fine.
in the same way, the code should generate some output that is transmitted back to the parent program.
the user can specify whether the code should be allowed to access any other data, whether it should be allowed to read or write to files, and whether it should have access to any other interfaces or OS methods.
it is possible to specify a maximum runtime after which the code will be interrupted if it hasn't finished executing yet.
the parent program and the code to execute may be different languages. You can assume that the programs necessary to compile and execute the given code are installed and available to the parent program. If the languages are different assume that some standard format like JSON can be used for transmitting the data (or is there a way to do this more efficiently?)
I think that this should be doable with a Virtual Machine. However, speed is a concern and I want to be able to execute many code blocks quickly, so that creating and tearing down a VM for each of them may be prohibitively expensive.
Another option is creating a sandbox, which e.g. Java can do, but as far as I am aware only for executing other Java code. I am unable to find a solution to do this with arbitrary languages.
For which languages does this work well, for which is it difficult?
Is this easier on some OS than on others?

How to Synchronize object between multiple instance of Node Js application

Is there any to lock any object in Node JS application.
Is there are multiple instance for application is available some function shouldnt run concurrent. If instance A function is completed, it should unlock that object/key or some identifier and B instance of application should check if its unlock it should run some function.
Any Object or Key can be used for identifying the locking and unlocking the function.
How to do that in NodeJS application which have multiple instances.
As mentioned above Redis may be your answer, however, it really depends on the resources available to you. There are some other possibilities less complicated and certainly less powerful which may also do the trick.
node-cache may also do the trick, if you set it up correctly. It is not any where near as powerful as Redis, but on the bright side it does not require as much setup and interaction with your environment.
So there is Redis and node-cache for memory locks. I should mention there are quite a few NPM packages which do the cache. Depends on what you need, and how intricate your cache needs to be.
However, there are less elegant ways to do what you want, though less elegant is not necessarily worse.
You could use a JSON file based system and hold locks on the files for a TTL. lockfile or proper-lockfile will accomplish the task. You can read the information from the files when needed, delete when required, give them a TTL. Basically a cache system to disk.
The memory system is obviously faster. The file system requires just as much planning in your code as the memory system.
There is yet another way. This is possibly the most dangerous one, and you would have to think long and hard on the consequences in terms of security and need.
Node.js has its own process.env. As most know this holds the system global variables available to all by simply writing process.env.foo where foo would have been declared as a global system variable. A package such as .dotenv allows you to add to your system variables by way of a .env text file. Thus if you put in that file sam=mongoDB, then in your code where you write process.env.sam it will be interpreted as mongoDB. Tons of system wide variables can be set up here.
So what good does that do, you may ask? Well these are system wide variables, and they can be changed in mid-flight. So if you need to lock the variables and then change them it is a simple manner to do it with. Beware though of the gotcha here. Once the system goes down, or all processes stop, and is started again, your environment variables will return to the default in the .env file.
Additionally, unless you are running a system which is somewhat safe on AWS or Azure etc. I would not feel secure in having my .env file open to the world. There is a way around this one too. You can use a hash to encrypt all variables and put the hash in the file. When you call it, decrypt before actually requesting use of the full variable.
There are probably many wore ways to lock and unlock, not the least of which is to use the native Node.js structure. Combine File System events together with Crypto. But this demands a much deeper level of understanding of the actual Node.js library and structures.
Hope some of this helped.
I strongly recommend Redis in your case.
There are several ways to create a application/process shared object, using locks is one of them, as you mentioned.
But they're just complicated. Unless you really need to do that yourself, Redis will be good enough. Atomic ops cross multiple process, transaction and so on.
Old thread but I didn't want to use redis so I made my own open source solution which utilizes websocket connections:
https://github.com/OneAndonlyFinbar/sync-cache

Is it possible to get an UPDATED environment variable once a Node.js script is running?

How can one retrieve an updated environment variable once a node script is already running?
Consider the following gulp task. How can you update the environment variable Z (i.e., from some other script running on the system) so that the function outputs different values after it's running?
As is, environment variables set elsewhere are ignored and the function always outputs 'undefined'.
gulp.task('z', function () {
var z;
setInterval(function(){
z = process.env.Z;
console.log('Value of Z is: ' + z);
}, 1000);
});
Running Windows 7. I have tried both set and setx but nothing will persist into the running node script. I wouldn't think this is possible since you generally can't pass environment variables between command prompts without re-launching them (and using setx). But then again SO users are special and I've been surprised before. Is this even possible?
Yes it's possible. There are many options actually. You can pass variables between multiple scripts with 'inter process communication (IPC)'...
The easyest option is probably to do it via sockets, ie with use of redis. Advantage of this is that you can also use it to communicate between processes running on different devices, if this would be required in the future.
Another option is to do it via the built in process signalling:
https://nodejs.org/api/process.html#process_signal_events
There are many other options/library's with each their pro's and cons. For scalability you are probably best of when you do the communication via sockets, If performance is very important you can choose for an option which uses shared memory.
In this question are multiple options discussed;
What's the most efficient node.js inter-process communication library/method?

Resources