Stop/abort/terminate required (loaded) module - node.js

Is there a possible way to stop/abort/terminate a required/loaded module?
I found here (https://stackoverflow.com/a/6677355/5781499) something:
var name = require.resolve('moduleName');
delete require.cache[name];
But this does not stop/abort a running timer or similar.
It just keeps doing what the script does.
The reason for me to need this, I want to implement a plugin system where you can start & stop plugins.
"Starting" is easy, just load with require(...) the code.
But what would be the best way to stop everything the plugin is doing?
I have though about a VM, but in node there is no way to abort either a vm execution.
Next thing that came to my mind, was "Worker Threads". They provide a .terminate method which does what I need. (But now I have to deal with inter process communication, which is very complex to keep everything synced)
Would be awesome if someone could give me a hint/tip.

Nodejs does not provide any feature to do what you want so you will have to do a bunch of things manually. As you've discovered, deleting the module for the module cache only affects what happens if you try to load the code again, it does not affect the already loaded code at all.
If you're going to keep the plug-ins in the same process, then you can implement a required method in your plug-ins called something like "shutdown" where the plug-in shuts itself down manually (stops timers, unregisters event handlers, etc...). Implemented correctly, this should disconnect it entirely from anything in your nodejs program. If you then delete the module from the require cache, you can then load a new module in its place. The one downside to this is that nodejs does not ever unload the original code - it just stays in memory. If you're not accessing that original module handle, that code never gets used again, but it isn't freed or GCed by nodejs.
A bit more robust system would be to put each plug-in in their own child-process or worker thread and just communicate with them via the built-in interprocess communication between parent and child process which is essentially just messaging. As long as you don't have to send large amounts of data between parent and child/worker or have super high bandwidth data, then the messaging is pretty simple to use and works well.
If using a separate child process, you can then kill the child process at anytime and the OS will reclaim all resources used by the process (not quite so true for a workerThread). This has its own downsides in that it will likely use a lot more memory since a whole new nodejs process or workerThread in nodejs is a much heavier weight thing than just loading a single module into your existing nodejs process.
Running it in a child process has the advantage that your main process is much more protected from errant code in the plug-in (either accidental or malicious) since they are different processes and the plug-in can't directly mess with the parent process. But, don't fool yourself here, unless you run it in a sandboxed VM, the plug-in can still wreak some havoc on the system since it has access to many resources on the system (disk, network, other peripherals, etc...).

Related

Is there a node.js api that allows to store the current running node process and resume it later?

Googling for it results in many “how to persist data in a node app” but I’m looking on a way to store the program counter, memory status, event loop, call stack etc in persistent storage, and resume it later.
Benefits: if you see the runtime (a server, container, serverless function) is about to terminate, instead of using business logic to pause and resume (custom work), use the same way operating systems handle multiple processes / threads. Store everything, then resume it later form a different infrastructure (but with identical specs).
I’m sure there is something like this, but simply can’t find the right search term probably.
Ps this might be an OS feature that I’m looking for and not node specific, but if this can be done from within Node’s API (Eg v8 internals) I can basically get an unlimited / long running lambda ;) (which is a bad idea but I want to know if it’s possible).
(V8 developer here.)
V8 definitely doesn't support this.
What V8 does support is taking a heap snapshot, and deserializing that on renewed process startup (and I believe Node is making use of this functionality). That's quite different from freezing an entire running process though.
I'm not sure what you mean by "the same way operating systems handle multiple processes / threads". Operating systems don't usually let you snapshot a process and transfer it to a different machine.
On the same machine, you could literally just let the OS do it: pause the process (e.g. press Ctrl+Z if you started it at a Linux command line, or use equivalent Task Manager functionality if your OS provides it, or similar), and resume it later. If the process itself doesn't fire any repeated tasks/timers, then that's almost equivalent to simply doing nothing: a process that executes no work won't get scheduled by the kernel anyway; a server that isn't serving any requests can just sit around waiting.
If you actually need to transfer a running process to another machine, your best bet may be a VM which you can snapshot, transfer, resume.

node: persist data after process termination

I'm using node-cache to cache data from a CLI application that observes changes in files and caches them to avoid new data processing.
the problem is that I noticed that this cache is destroyed on each command, since each time the tool is called in the terminal a new instance is generated and the old one is destroyed. probably, the data is also destroyed.
I need to keep, for a specific TTL, two things in cache/memory, even if the process ends:
the processed data
the specific instance of fs.watcher, watching and executing caching operations
the question is: how do i do it? I've been searching for days on the internet and trying alternatives and I can't find a solution.
I need to keep ... things in cache/memory, even if the process ends
That's, pretty much by definition, not possible. When a process terminates, all its resources are freed up for use by something else (barring a memory-leak bug in the OS itself).
It sounds like you need to refactor your app into a service that can run in the background and separate front-end that can communicate with it.

If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?

In Node.js cluster mode, if multiple jobs exist in the event loop for one process, should the current job crash the process, what happens to the remaining job?
I'm assuming the remaining jobs in the event loop would go unfulfilled or return a server error. My question is, why is this an acceptable risk? Why would someone opt to use Node.js cluster mode in production then, rather than use something like PHP in production, where there is no risk of this, because PHP handles each request in its own process.
Edit:
Obviously this doesn't just apply to Node.js cluster mode. It can happen on a single instance, in which case obviously the end user would just get a server error. Cluster mode just happens to be my personal use case.
I'm looking for a way to pick back up a job in the queue job should a previous job cause the process to exit, before the subsequent job gets a change to be fulfilled. I am currently reading about how you can use a tool like RabbitMQ to handle your job queue outside of the node.js cluster, and each cluster instance just pulls jobs from the RabbitMQ queue. If anyone has any input on that, that would also be greatly appreciated.
If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?
If a node.js process crashes, the same thing happens to it that happens to any other process. All open sockets get automatically disconnected and the client will receive an immediate close on their socket (socket connection dropped essentially).
If you were using a Java server that was in the middle of handling 10 requests (perhaps in threads) and it crashed, the consequences would be the same. All 10 socket connections would get dropped.
If process isolation from one request to another is your #1 criteria for selecting a server environment, then I guess you wouldn't pick any environment that ever serves multiple requests from the same process. But, you would give up a lot of get that. One of the reasons for the node.js design is that is scales really, really well for a high number of concurrent connections that are all doing mostly I/O things (disk, networking, database stuff, etc...) which happens to be most web servers. Whereas a design that fires up a new process for every incoming connection does not scale as well for a large number of concurrent connections because a process is a much more heavy-weight thing in the eyes of the operating system (memory usage, other system resource usage, task switching overhead, etc...) than the way node.js does things.
And, there are obviously hundreds of other considerations too when choosing a server environment. So, you kind of have to look at the whole picture of what you're designing for and make the best set of tradeoffs.
In general, I wouldn't put this issue anywhere on the radar for why you should choose one over the other unless you expect to be running risky code (perhaps out of your control) that crashes a lot and this issue is therefore more important in your deployment than all the other differences. And, if that was the case, I'd probably isolate the risky code to its own process (even when using nodejs) to alleviate any pain from that crash. You could have a process pool waiting to process risky things. For example, if you were running code submitted by a user, I might run that code in its own isolated VM.
If you're just worried about your own code crashing a lot, then you probably have bigger problems and need more extensive unit testing, more robust error handling and need to take advantage of other tools just as a linter and other code analysis tools to find potential problem areas. With proper design, implementation and error handling, you should be able to keep a single incoming request from harming anything other than itself. That's certainly the philosophy that every server environment that serves multiple requests from the same process advises and the people/companies deploying those servers use.

How worker threads works in Nodejs?

Nodejs can not have a built-in thread API like java and .net
do. If threads are added, the nature of the language itself will
change. It’s not possible to add threads as a new set of available
classes or functions.
Nodejs 10.x added worker threads as an experiment and now stable since 12.x. I have gone through the few blogs but did not understand much maybe due to lack of knowledge. How are they different than the threads.
Worker threads in Javascript are somewhat analogous to WebWorkers in the browser. They do not share direct access to any variables with the main thread or with each other and the only way they communicate with the main thread is via messaging. This messaging is synchronized through the event loop. This avoids all the classic race conditions that multiple threads have trying to access the same variables because two separate threads can't access the same variables in node.js. Each thread has its own set of variables and the only way to influence another thread's variables is to send it a message and ask it to modify its own variables. Since that message is synchronized through that thread's event queue, there's no risk of classic race conditions in accessing variables.
Java threads, on the other hand, are similar to C++ or native threads in that they share access to the same variables and the threads are freely timesliced so right in the middle of functionA running in threadA, execution could be interrupted and functionB running in threadB could run. Since both can freely access the same variables, there are all sorts of race conditions possible unless one manually uses thread synchronization tools (such as mutexes) to coordinate and protect all access to shared variables. This type of programming is often the source of very hard to find and next-to-impossible to reliably reproduce concurrency bugs. While powerful and useful for some system-level things or more real-time-ish code, it's very easy for anyone but a very senior and experienced developer to make costly concurrency mistakes. And, it's very hard to devise a test that will tell you if it's really stable under all types of load or not.
node.js attempts to avoid the classic concurrency bugs by separating the threads into their own variable space and forcing all communication between them to be synchronized via the event queue. This means that threadA/functionA is never arbitrarily interrupted and some other code in your process changes some shared variables it was accessing while it wasn't looking.
node.js also has a backstop that it can run a child_process that can be written in any language and can use native threads if needed or one can actually hook native code and real system level threads right into node.js using the add-on SDK (and it communicates with node.js Javascript through the SDK interface). And, in fact, a number of node.js built-in libraries do exactly this to surface functionality that requires that level of access to the nodejs environment. For example, the implementation of file access uses a pool of native threads to carry out file operations.
So, with all that said, there are still some types of race conditions that can occur and this has to do with access to outside resources. For example if two threads or processes are both trying to do their own thing and write to the same file, they can clearly conflict with each other and create problems.
So, using Workers in node.js still has to be aware of concurrency issues when accessing outside resources. node.js protects the local variable environment for each Worker, but can't do anything about contention among outside resources. In that regard, node.js Workers have the same issues as Java threads and the programmer has to code for that (exclusive file access, file locks, separate files for each Worker, using a database to manage the concurrency for storage, etc...).
It comes under the node js architecture. whenever a req reaches the node it is passed on to "EVENT QUE" then to "Event Loop" . Here the event-loop checks whether the request is 'blocking io or non-blocking io'. (blocking io - the operations which takes time to complete eg:fetching a data from someother place ) . Then Event-loop passes the blocking io to THREAD POOL. Thread pool is a collection of WORKER THREADS. This blocking io gets attached to one of the worker-threads and it begins to perform its operation(eg: fetching data from database) after the completion it is send back to event loop and later to Execution.

Thread inside Application vs. Server process

I have a site which sometimes takes particularly long to process a request (and that's not a defect). 99% of the time it's pretty quick because it almost doesn't do any processing.
I want to show a message that says "Loading" when the site takes long to process the request. My site uses mod_wsgi and Apache. The way I see it, I would respond saying 'Loading' before completing the processing and do one of two things right before:
-spawn a (daemon) thread to take care of the processing.
-communicate through socket with other process and tell it to take care of the processing (most likely send request to http://localhost:8080/do_processing).
What are the pros and cons of one approach vs the other?
Using a separate process is better. It does not have to be hard at all as suggested in another answer as you can use an existing system for doing exactly that such as Celery (http://celeryproject.org/). Relying on in process threads is not necessarily a good idea unless you are going to implement an internal job queueing system of your own to prevent blowing out of number of threads. Also, in a multiprocess server configuration you cant be guaranteed a request comes back to the same process and so not easy to get status of a running operation. Finally, the web server processes could get killed off and thus your background task could also be killed before it finishes. You would need to have a mechanism for holding state which can survive such an event if that was important. Far easier to use something like Celery.
The process route requires quite a bit of a system processing. Creation of a separate process is relatively expensive and slow. However if your process crashes it doesn't affect your main governing process (you will receive the exit status code and will have an opportunity to respawn a new working process). You will also need some sort of InterProcessCommunication layer (can be a socket, pipe, shared memory, etc...) which is adds to complexity if your project.
Threads are lightweight and cheap. All you need to do is to manage concurrent access to shared resources. So it really depends on the task you have in mind. Threads probably will be more likely the appropriate way to implement your task.

Resources