Multiple bash scripts can't run asynchronously within spawned child processes - node.js

This was running good for single call asynchronously:
"use strict";
function bashRun(commandList,stdoutCallback,completedCallback)
{
const proc=require("child_process");
const p=proc.spawn("bash");
p.stdout.on("data",function(data){
stdoutCallback(output);
});
p.on("exit",function(){
completedCallback();
});
p.stderr.on("data",function(err){
process.stderr.write("Error: "+err.toString("utf8"));
});
commandList.forEach(i=>{
p.stdin.write(i+"\n");
});
p.stdin.end();
}
module.exports.bashRun = bashRun;
But when inside a for loop, it doesn't. It just outputs only latest element(process)'s stdout info:
for(var i=0;i<20;i++)
{
var iLocal =i;
bashRun(myList,function(myStdout){ /* only result for iLocal=19 !*/},function(){});
}
I need this asynchronously (and also concurrently with multiple child processes) give output from each stdoutCallback functions to do some processing in it. While stdout doesn't work, completedCallback is called 20 times at least so there must be still 20 processes throughout some time slice but not sure if they existed on same slice of time.
What am I doing wrong so that spawned child processes can not give their output to nodejs? (why only last of them (i=19) can?)
I tried to exchange spawn with fork but now it gives error
p.stdout.on("data",function(data){
^
TypeError: Cannot read property 'on' of null
How can I use something else to retain same functionality of above module?

Looks like issue with scope value of i, try changing loop to use let.
Eg: for(let i=0;i<20;i++)

Related

Node.js Spawning multiple threads within a class method

How can I run a single method multiple times multi-threaded when called as a method of a class?
At first I tried to use the cluster module, but I realize it just re-runs the whole process from the start, rightfully so.
How can I achieve something like what's outlined below?
I want a class's method to spawn n processes, and when the parallel tasks are completed, I can resolve a promise which the method returns.
The problem with the code below is that calling cluster.fork() will fork index.js process.
index.js
const Person = require('./Person.js');
var Mary = new Person('Mary');
Mary.run(5).then(() => {...});
console.log('I should only run once, but I am called 5 times too many');
Person.js
const cluster = require('cluster');
class Person{
run(distance){
var completed = 0;
return new Promise((resolve, reject) => {
for(var i = 0; i < distance; i++) {
// run a separate process for each
cluster.fork().send(i).on('message', message => {
if (message === 'completed') { ++completed; }
if (completed === distance) { resolve(); }
});
}
});
}
}
I think the short answer is impossible. It's even worse - this has nothing to do with js. To multi (process or thread) in your particular problem you will essentially need a copy of the object in every thread, since it needs (maybe) access to fields - in this case you would need to either initialize it in every thread or share memory. That last one I don't think is provided in cluster, and not trivial in other languages in every use case.
If the calculation is independent of the Person I suggest you extract it, and use the usual (in index.js):
if(cluster.isWorker) {
//Use the i for calculation
} else {
//Create Person, then fork children in for loop
}
You then collect the results and change the Person as needed. You will be copying index.js, but this is standard and you only run what you need.
The problem is if results are dependent on Person. If these are constant for all i you can still send them to your forks independently. Otherwise what you have is the only way to fork. In general forking in cluster is not meant for methods, but for the app itself, which is the standard forking behavior.
Another solution
Following your comment, I suggest you checkout child_process.execFile or child_process.exec on same file.
This way you can spawn a totally independent process on the fly. Now instead of calling cluster.fork you call execFile. You can use either the exit code or stdout as return values (stderr etc.). Promise is now replaced with:
var results = []
for(var i = 0; i < distance; i++) {
// run a separate process for each
results.push(child_process.execFile().child.execFile('node', 'mymethod.js`,i]));
}
//... catch the exit event from all results or return a callback using results.
Inside mymethod.js Have your code that takes i and returns what you want either in the exit code or through stdout, both properties of the returned child_process. This is a bit un-node.js-y since you're waiting on asynchronous calls, but you're requirements are non standard. Since I'm not sure how you use this perhaps returning a callback with the array is a better idea.

How to forcibly keep a Node.js process from terminating?

TL;DR
What is the best way to forcibly keep a Node.js process running, i.e., keep its event loop from running empty and hence keeping the process from terminating? The best solution I could come up with was this:
const SOME_HUGE_INTERVAL = 1 << 30;
setInterval(() => {}, SOME_HUGE_INTERVAL);
Which will keep an interval running without causing too much disturbance if you keep the interval period long enough.
Is there a better way to do it?
Long version of the question
I have a Node.js script using Edge.js to register a callback function so that it can be called from inside a DLL in .NET. This function will be called 1 time per second, sending a simple sequence number that should be printed to the console.
The Edge.js part is fine, everything is working. My only problem is that my Node.js process executes its script and after that it runs out of events to process. With its event loop empty, it just terminates, ignoring the fact that it should've kept running to be able to receive callbacks from the DLL.
My Node.js script:
var
edge = require('edge');
var foo = edge.func({
assemblyFile: 'cs.dll',
typeName: 'cs.MyClass',
methodName: 'Foo'
});
// The callback function that will be called from C# code:
function callback(sequence) {
console.info('Sequence:', sequence);
}
// Register for a callback:
foo({ callback: callback }, true);
// My hack to keep the process alive:
setInterval(function() {}, 60000);
My C# code (the DLL):
public class MyClass
{
Func<object, Task<object>> Callback;
void Bar()
{
int sequence = 1;
while (true)
{
Callback(sequence++);
Thread.Sleep(1000);
}
}
public async Task<object> Foo(dynamic input)
{
// Receives the callback function that will be used:
Callback = (Func<object, Task<object>>)input.callback;
// Starts a new thread that will call back periodically:
(new Thread(Bar)).Start();
return new object { };
}
}
The only solution I could come up with was to register a timer with a long interval to call an empty function just to keep the scheduler busy and avoid getting the event loop empty so that the process keeps running forever.
Is there any way to do this better than I did? I.e., keep the process running without having to use this kind of "hack"?
The simplest, least intrusive solution
I honestly think my approach is the least intrusive one:
setInterval(() => {}, 1 << 30);
This will set a harmless interval that will fire approximately once every 12 days, effectively doing nothing, but keeping the process running.
Originally, my solution used Number.POSITIVE_INFINITY as the period, so the timer would actually never fire, but this behavior was recently changed by the API and now it doesn't accept anything greater than 2147483647 (i.e., 2 ** 31 - 1). See docs here and here.
Comments on other solutions
For reference, here are the other two answers given so far:
Joe's (deleted since then, but perfectly valid):
require('net').createServer().listen();
Will create a "bogus listener", as he called it. A minor downside is that we'd allocate a port just for that.
Jacob's:
process.stdin.resume();
Or the equivalent:
process.stdin.on("data", () => {});
Puts stdin into "old" mode, a deprecated feature that is still present in Node.js for compatibility with scripts written prior to Node.js v0.10 (reference).
I'd advise against it. Not only it's deprecated, it also unnecessarily messes with stdin.
Use "old" Streams mode to listen for a standard input that will never come:
// Start reading from stdin so we don't exit.
process.stdin.resume();
Here is IFFE based on the accepted answer:
(function keepProcessRunning() {
setTimeout(keepProcessRunning, 1 << 30);
})();
and here is conditional exit:
let flag = true;
(function keepProcessRunning() {
setTimeout(() => flag && keepProcessRunning(), 1000);
})();
You could use a setTimeout(function() {""},1000000000000000000); command to keep your script alive without overload.
spin up a nice repl, node would do the same if it didn't receive an exit code anyway:
import("repl").then(repl=>
repl.start({prompt:"\x1b[31m"+process.versions.node+": \x1b[0m"}));
I'll throw another hack into the mix. Here's how to do it with Promise:
new Promise(_ => null);
Throw that at the bottom of your .js file and it should run forever.

execute a function parallely, while completing execution of rest of the code

I have a code snippet in nodejs like this:
in every 2 sec, foo() will be called.
function foo()
{
while (count < 10)
{
doSometing()
count ++;``
}
}
doSomething()
{
...
}
The limitation is, foo() has no callback.
How to make while loop execute and foo() completes without waiting for dosomething() to complete (call dosomething() and proceed), and dosomething() executes parallely?
I think, what you want is:
function foo()
{
while (count < 10)
{
process.nextTick(doSometing);
count ++;
}
}
process.nextTick will schedule the execution of doSometing on the next tick of the event loop. So, instead of switching immediately to doSometing this code will just schedule the execution and complete foo first.
You may also try setTimeout(doSometing,0) and setImmediate(doSometing). They'll allow I/O calls to occur before doSometing will be executed.
Passing arguments to doSomething
If you want to pass some parameters to doSomething, then it's best to ensure they'll be encapsulated and won't change before doSomething will be executed:
setTimeout(doSometing.bind(null,foo,bar),0);
In this case doSometing will be called with correct arguments even if foo and bar will be changed or deleted. But this won't work in case if foo is an object and you changes one of its properties.
What the alternatives are?
If you want doSomething to be executed in parallel (not just asynchronous, but actually in parallel), then you may be interested in some job-processing solution. I recommend you to look at kickq:
var kickq = require('kickq');
kickq.process('some_job', function (jobItem, data, cb) {
doSomething(data);
cb();
});
// ...
function foo()
{
while (count < 10)
{
kickq.create('some_job', data);
count ++;
}
}
kickq.process will create a separate process for processing your jobs. So, kickq.create will just register the job to be processed.
kickq uses redis to queue jobs and it won't work without it.
Using node.js build-in modules
Another alternative is building your own job-processor using Child Process. The resulting code may look something like this:
var fork = require('child_process').fork,
child = fork(__dirname + '/do-something.js');
// ...
function foo()
{
while (count < 10)
{
child.send(data);
count ++;
}
}
do-something.js here is a separate .js file with doSomething logic:
process.on('message', doSomething);
The actual code may be more complicated.
Things you should be aware of
Node.js is single-threaded, so it executes only one function at a time. It also can't utilize more then one CPU.
Node.js is asynchronous, so it's capable of processing multiple functions at once by switching between them. It's really efficient when dealing with functions with lots of I/O calls, because it's newer blocks. So, when one function waits for the response from DB, another function is executed. But node.js is not a good choice for blocking tasks with heavy CPU utilization.
It's possible to do real parallel calculations in node.js using modules like child_process and cluster. child_process allows you to start a new node.js process. It also creates a communication channel between parent and child processes. Cluster allows you to run a cluster of identical processes. It's really handy when you're dealing with http requests, because cluster can distribute them randomly between workers. So, it's possible to create a cluster of workers processing your data in parallel, though generally node.js is single-threaded.

How can I execute a node.js module as a child process of a node.js program?

Here's my problem. I implemented a small script that does some heavy calculation, as a node.js module. So, if I type "node myModule.js", it calculates for a second, then returns a value.
Now, I want to use that module from my main Node.JS program. I could just put all the calculation in a "doSomeCalculation" function then do:
var myModule = require("./myModule");
myModule.doSomeCalculation();
But that would be blocking, thus it'd be bad. I'd like to use it in a non-blocking way, like DB calls natively are, for instance. So I tried to use child_process.spawn and exec, like this:
var spawn = require("child_process").spawn;
var ext = spawn("node ./myModule.js", function(err, stdout, stderr) { /* whatevs */ });
ext.on("exit", function() { console.log("calculation over!"); });
But, of course, it doesn't work. I tried to use an EventEmitter in myModule, emitting "calculationDone" events and trying to add the associated listener on the "ext" variable in the example above. Still doesn't work.
As for forks, they're not really what I'm trying to do. Forks would require putting the calculation-related code in the main program, forking, calculating in the child while the parent does whatever it does, and then how would I return the result?
So here's my question: can I use a child process to do some non-blocking calculation, when the calculation is put in a Node file, or is it just impossible? Should I do the heavy calculation in a Python script instead? In both cases, how can I pass arguments to the child process - for instance, an image?
I think what you're after is the child_process.fork() API.
For example, if you have the following two files:
In main.js:
var cp = require('child_process');
var child = cp.fork('./worker');
child.on('message', function(m) {
// Receive results from child process
console.log('received: ' + m);
});
// Send child process some work
child.send('Please up-case this string');
In worker.js:
process.on('message', function(m) {
// Do work (in this case just up-case the string
m = m.toUpperCase();
// Pass results back to parent process
process.send(m.toUpperCase(m));
});
Then to run main (and spawn a child worker process for the worker.js code ...)
$ node --version
v0.8.3
$ node main.js
received: PLEASE UP-CASE THIS STRING
It doesn't matter what you will use as a child (Node, Python, whatever), Node doesn't care. Just make sure, that your calculcation script exits after everything is done and result is written to stdout.
Reason why it's not working is that you're using spawn instead of exec.

node.js: Is there a way for the callback function of child_process.exec() to return the process PID

Node.JS Exec Question:
I have a program which spawns multiple processes, and I'd like to log the order of the processes finishing by capturing the PID when the process finishes. From what I can tell the standard callbacks do not include the PID (stdout,stderr,and error).
I'd like to avoid using spawn, but it looks like I'll have to, unless any kind souls have some ideas.
Thanks in advance.
EDIT:
To clarify:
var child = child_process.exec(..., function() {
console.log( child.pid );
});
will not work for multiple processes. This returns the last process, not the process which triggered the call back.
var child = child_process.exec(..., function() {
console.log( child.pid );
});
I highly suggest reading the documentation - you might find answers to all your questions there. :)
// EDIT If you are using a loop to create processes, then use it like that:
var create_child = function( i ) {
// creates a seperate scope for child variable
var child = child_process.exec(..., function() {
console.log( child.pid );
});
};
for (var i = 0; i < 100; i++) {
// does not create a seperate scope
create_child( i );
}
to avoid scoping issues.
After some searching and a brief coffee break, I'm convinced that child_process.spawn is the way to go. Making an object array to handle the processes, and deleting those process objects on exit seems to work well.
See: https://gist.github.com/3990987

Resources