I remember reading somewhere that it isn't advisable to use "exec" within the C code, compiled by NDK.
What is the recommended approach? Do we try and push the EXEC code up to the Java-space; that is, so the JNI (or application) spawns the new process, (and where relevant passes the results back down to the NDK)?
First off, it's not recommended to use either fork or exec. All of your code is generally supposed to live in a single process which is your main Android application process, managed by the Android framework. Any other process is liable to get killed off by the system at any time (though in practice that doesn't happen in present Android versions as far as I have seen).
The rationale as I understand it is simply that the Android frameworks can't properly manage the lifetime and lifecycle of your app, if you go to spawn other processes.
Exec
You have no real alternative here but to avoid launching other executables at all. That means you need to turn your executable code into a library which you link directly into your application and call using normal NDK function calls, triggered by JNI from the Java code.
Fork
Is more difficult. If you really need a multi-process model, and want to fit within the letter of the rules, you need to arrange for the Android framework to fork you from its Zygote process. To do this, you should run all your background code in a different Service which is stated to run in a different process within the AndroidManifest.xml.
To take this to extremes, if you need multiple identical instances of the code running in different processes for memory protection and isolation reasons, you can do what Android Chrome does:
Run all your background/forked code in a subclass of Service
Create multiple subclasses of that
List each of these subclasses as a separate service within your AndroidManifest.xml each with a different process attribute
In your main code, remember exactly which services you've fired up and not, and manage them using startService/stopService.
Of course, if you've turned your native code into a library rather than an executable, you probably don't need fork anyway. The only remaining reason to use fork is to achieve memory protection/isolation.
In practice
In practice quite a lot of apps ignore all this and use fork/exec within their native code directly. At the moment, it works, at least for short-running tasks.
Related
Is there a possible way to stop/abort/terminate a required/loaded module?
I found here (https://stackoverflow.com/a/6677355/5781499) something:
var name = require.resolve('moduleName');
delete require.cache[name];
But this does not stop/abort a running timer or similar.
It just keeps doing what the script does.
The reason for me to need this, I want to implement a plugin system where you can start & stop plugins.
"Starting" is easy, just load with require(...) the code.
But what would be the best way to stop everything the plugin is doing?
I have though about a VM, but in node there is no way to abort either a vm execution.
Next thing that came to my mind, was "Worker Threads". They provide a .terminate method which does what I need. (But now I have to deal with inter process communication, which is very complex to keep everything synced)
Would be awesome if someone could give me a hint/tip.
Nodejs does not provide any feature to do what you want so you will have to do a bunch of things manually. As you've discovered, deleting the module for the module cache only affects what happens if you try to load the code again, it does not affect the already loaded code at all.
If you're going to keep the plug-ins in the same process, then you can implement a required method in your plug-ins called something like "shutdown" where the plug-in shuts itself down manually (stops timers, unregisters event handlers, etc...). Implemented correctly, this should disconnect it entirely from anything in your nodejs program. If you then delete the module from the require cache, you can then load a new module in its place. The one downside to this is that nodejs does not ever unload the original code - it just stays in memory. If you're not accessing that original module handle, that code never gets used again, but it isn't freed or GCed by nodejs.
A bit more robust system would be to put each plug-in in their own child-process or worker thread and just communicate with them via the built-in interprocess communication between parent and child process which is essentially just messaging. As long as you don't have to send large amounts of data between parent and child/worker or have super high bandwidth data, then the messaging is pretty simple to use and works well.
If using a separate child process, you can then kill the child process at anytime and the OS will reclaim all resources used by the process (not quite so true for a workerThread). This has its own downsides in that it will likely use a lot more memory since a whole new nodejs process or workerThread in nodejs is a much heavier weight thing than just loading a single module into your existing nodejs process.
Running it in a child process has the advantage that your main process is much more protected from errant code in the plug-in (either accidental or malicious) since they are different processes and the plug-in can't directly mess with the parent process. But, don't fool yourself here, unless you run it in a sandboxed VM, the plug-in can still wreak some havoc on the system since it has access to many resources on the system (disk, network, other peripherals, etc...).
I'm running electron on linux server for web scraping. And currently I'm running new electron command for each task. But it results in high cpu usage. Now thinking about running single electron instance, and create new BrowserWindow for each task. It will take some time to adapt the code base for this style, so I wanted to ask here first. Will it make a difference in cpu usage, and how much?
Basically, creating a new NodeJS process will result in re-parsing your application's code, which will highly affect your CPU usage. Creating only a new BrowserWindow will only create a new renderer process, which is way more efficient.
If your application is packaged, e.g. with electron-packager, then creating a new instance will also affect your CPU usage like creating another NodeJS process, because that packaged (aka compiled) application has a copy of NodeJS in it, which is enough to run your code, but still affects the CPU usage.
But the decision depends on how you use the server. If you only run the Electron application to carry out the tasks that have been defined by you, adapting your working code would have no to only a low benefit. If you want to release this application and/or that server is used by some other tasks, e.g. a web server, it would be a real benefit if you adapt your code.
Running multiple instances of the main nodejs process with the default configuration is not actually supported or tested. You'll find that any features that persists data to disk either don't work, or don't work as expected (ie. localstorage, indexeddb, sessions, etc).
https://github.com/electron/electron/issues/2493
You can work around this by changing the data directory for each instance so they don't trample over each other but this is likely to use a lot of disk space and you'd need a way to keep track of all these data directories.
A single main process with multiple renderers is nearly always the answer.
I have a process that needs to be extensible by loading shared libraries. Is there a way to run the shared library code in a sandbox environment (other than an external process) so that if it segfaults it doesn't crash the process and has limitations on how much memory it can allocate, the cpu cycles it can use, etc.
I don't think there is a clean way to do it. You could try:
Catching segfaults and recovering from them (tricky, architecture specific, but doable)
Replacing calls to malloc/calloc for the library with an instrumented version that would count the allocated space (how to replace default malloc by code)
Alternatively use malloc hooks (http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html)
CPU cycles are accounted for the whole process, so I don't think there is any way you could get the info for just the library. Only viable option - manuallty measure ticks for every library call that your code makes.
In essence - this would be fun to try, but I recommend you go with the separate process approach and use RPC, quotas, ulimits etc.
No. If the shared library segfaults your process will segfault (the process is executing the library code). If you run it as an external process and use an RPC mechanism then you'd be okay as for crashing, but your program would need to detect when the service wasn't available (and something would need to restart it). Tools like chroot can sandbox processes, but not the individual libraries that an executable links to.
So I was developing a server farm on Node which requires multiple processes per machine to handle the load. Since Windows doesn't quite get along with Node cluster module, I had to manually work it out.
The real problem is when I was forking Node processes, a JS module path was required as the first argument to the child_process.fork() function and once forked, the child process wouldn't inherit anything from its parent. In my case, I want a function that does similar thing as fork() system call in Linux, which clones the parent process, inherits everything and continues execution from exactly where the fork() is done. Can this be achieved on the Node platform?
I don't think node.js is ever going to support fork(2)
The comment from the node github page on the subject
https://github.com/joyent/node/issues/2334#issuecomment-3153822
We're not (ever) going to support fork.
not portable to windows
difficult conceptually for users
entire heap will be quickly copied with a compacting VM; no benefits from copy-on-write
not necessary
difficult for us to do
child_process.fork()
This is a special case of the spawn() functionality for spawning Node
processes. In addition to having all the methods in a normal
ChildProcess instance, the returned object has a communication channel
built-in. See child.send(message, [sendHandle]) for details.
Hoi.
I am working on an experiment allowing users to use 1% of my CPU. That's like your own Webserver; but a big dynamic remote execution framework (dont ask about that), and I dont want users to use API functions like create files, no sockets, no threads, no console output, nothing.
Update1: People will be sending me binaries, so interrupt 0x80 is possible. Therefore... Kernel?
I need to limit a process so it cannot do anything but use a single pipe. Through that pipe the process will use my own wrapped and controlled API.
Is that even possible? I thought like a Linux kernel module.
The issues with limiting RAM and CPU are not primary here, for that there's something on google.
Thanks in advance!
The ptrace facility will allow your program to observe and control the operation of another process. Using the PTRACE_SYSCALL flag, you can stop the child process before every syscall, and make a decision about whether you want to allow that system call to proceed.
You might want to look at what Google is doing with their Native Client technology and the seccomp sandbox. The Native Client (NaCl) stuff is intended to let x86 binaries supplied by a web site run inside a user's local browser. The problem of malicious binaries is similar to what you face, so most of the technology/research probably applies directly.