when does Node spawned child process actually start? - node.js

In the documentation for Node's Child Process spawn() function, and in examples I've seen elsewhere, the pattern is to call the spawn() function, and then to set up a bunch of handlers on the returned ChildProcess object. For instance, here is the first example of spawn() given on that documentation page:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
The spawn() function itself is called on the second line. My understanding is that spawn() starts a child process asynchronously. From the documentation:
The child_process.spawn() method spawns a new process using the given
command, with command line arguments in args.
However, the following lines of the script above go on to set up various handlers for the process, so it's assuming that the process hasn't actually started (and potentially finished) between the time spawn() is called on line 2 and the other stuff happens on the subsequent lines. I know JavaScript/Node is single threaded. However, the operating system is not single threaded, and naively one would read that spawn() call to be telling the operating system to spawn the process right now (at which point, with unfortunate timing, the OS could suspend the parent Node process and run/complete the child process before the next line of the Node code is executed).
But it must be that the process doesn't actually get spawned until the current JavaScript function completes (or more generally the current JavaScript event handler that called the current function completes), right?
That seems like a pretty important thing to say. Why doesn't it say that in the Child Process documentation page? Is there some overriding Node principle that makes it unnecessary to say that explicitly?

The spawning of the new process starts immediately (it's handed over to the OS to actually fire up the process and get it going). Starting the new process with .spawn() is asynchronous and non-blocking. So, it will initiate the operation with the OS and immediately return. You might think that that's why it's OK to set up event handlers after it returns (because the process hasn't yet finished starting). Well, yes and no. It likely hasn't yet finished starting the new process, but that isn't the main reason why it's OK.
It's OK, because node.js runs all its events through a single threaded event queue. Thus no events from the newly spawned process can be processed until after your code finishes executing and returns control back to the system. Only then can it process the next event in the event queue and trigger one of the events you are registering handlers for.
Or, said another way, none of the events from the other process are pre-emptive. They won't/can't interrupt your existing Javascript code. So, since you're still running your Javascript code, those events can't get run yet. Instead, they sit in the event queue until your Javascript code finishes and then the interpreter can go get the next event from the event queue and run the callback associated with it. Likewise, that callback runs until it returns back to the interpreter and then the interpreter can get the next event and run its callback and so on...
That's why node.js is called an event-driven system.
As such, it's perfectly fine to do this type of structure:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
None of those data or close events can execute their callbacks until after your code is done and returns control back to the system. So, it's perfectly safe to set up those event handlers like you are. Even if the newly spawned process was running and generating events right away, those events will just sit in the event queue until your Javascript finishes what it is doing (which includes setting up your event handlers).
Now, if you delayed setting up the event handlers until some future tick of the event loop (as shown below) with something like a setTimeout(), then you could miss some events:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
setTimeout(() => {
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
}, 10);
Here you are not setting up the event handlers immediately as part of the same tick of the event loop, but after a short delay. Therefore some events could get processed from the event loop before you install your event handlers and you could miss some of these events. Obviously, you would never do it this way (on purpose), but I just wanted to show that code running on the same tick of the event loop does not have a problem, but code running on some future tick of the event loop could have a problem missing events.

This is to follow up on jfriend00's answer, to explain what it helped me understand, in case it helps someone else. I knew about the event-driven nature of JavaScript/Node. What jfriend00's explanation made clear to me is the idea that an event can happen and Node can be aware that it happened, but it doesn't actually decide which handlers to tell about that event until the next tick. For instance, if the spawn() call fails outright (e.g., command does not exist), Node obviously knows that immediately. My thought was that it would then immediately queue the appropriate handlers to run on the next tick. But what I now understand is that it puts the "raw event" (i.e., the fact that the spawn failed, with whatever details about that) in its queue, and then on the next tick it determines and calls the appropriate handlers. And the same is true for other events like receiving output from the process, etc. The event is saved but the appropriate handlers for the event are only determined when the next tick runs, so handlers assigned on the previous tick, after spawn(), will get called.

Related

In Sublime, why does my node process not exit naturally?

When I run my node script in Sublime 3 (as a build system ... Ctrl-B), if I add a listener to the stdin's data event, the process stays running until killed. This makes sense, since there's potentially still work to do.
process.stdin.on('data', (d)=> {
// ... do some work with `d`
});
However, I expected that if I removed the listener to that data event, my process would naturally exit. But it doesn't!
// This program never exits naturally.
function processData(d) {
// ... do some work with `d`, then...
process.stdin.removeListener('data', processData);
}
process.stdin.on('data', processData);
Even if you remove the event handler immediately after adding it, the process still sticks around...
function processData() {}
process.stdin.on('data', processData);
process.stdin.removeListener('data', processData);
In this exact case, I could use the once() function instead of on(), but that doesn't clear this up for me. What am I missing? Why does the stdin stream prevent the process from exiting, given it has no listeners of any kind?

How to ensure we listen to a child process' events before they occur?

Here is some node.js code that spawns a linux ls command and prompts its result
const spawn = require('child_process').spawn;
const ls = spawn('ls', ['-l']);
let content = "";
ls.stdout.on('data', function(chunk){
content += chunk.toString();
});
ls.stdout.on('end', function(){
console.log(content);
});
This works well. However, the ls command is launched asynchronously, completely separated from the main nodeJs thread. My concern is that the data and end events on the process' stdout may have occurred before I attached event listeners.
Is there a way to attach event listeners before starting that sub-process ?
Note : I don't think I can wrap a Promise around the spawn function to make this work, as it would rely on events to be properly catched to trigger success/failure (leading back to the problem)
There is no problem here.
Readable streams (since node v0.10) have a (limited) internal buffer that stores data until you read from the stream. If the internal buffer fills up, the backpressure mechanism will kick in, causing the stream to stop reading data from its source.
Once you call .read() or add a data event handler, the internal buffer will start to drain and will then start reading from its source again.

Which events should I use for spawning a child process to ensure I always make a callback

I'm using node to wrap an executable and I'm using the spawn event emitter. See the docs here. There are multiple events to subsribe to.
child = spawn("path/to/exe", args)
child.on('close', exitNormally )
child.on('exit', exitNormally )
child.on('error', exitAbnormally )
child.on('disconnect', exitAbnormally )
Should I be subscribing to all of them or is subscribing to close and error enough? I have a callback that I have to execute regardless of whether the outcome is a success or not. The docs for the events are here but it doesn't seem to say explictly say what I'm asking and I want to confirm that my thinking is correct and I don't miss any exits.
The exit event always will be called if your process ends, so I think it will be enough.

child process not calling close event

I have the following node.js code:
var testProcess = spawn(item.testCommand, [], {
cwd: process.cwd(),
stdio: ['ignore', process.stdout, process.stderr]
});
testProcess.on('close', function(data) {
console.log('test');
});
waitpid(testProcess.pid);
testProcess.kill();
however the close method never gets calls.
The end result I am looking for is that I spwan a process and the the script waits for that child processs to finish (which waitpid() is doing correctly). I want the output/err of the child process to be display to the screen (which the stdio config is doing correctly). I also want to perform code on the close of the child process which I was going to do in the close event (also tried exit), but it does not fire.
Why is the event not not firing?
http://nodejs.org/api/process.html
Note that just because the name of this function is process.kill, it is really just a signal sender, like the kill system call. The signal sent may do something other than kill the target process.
You can specify the signal while Kill() call.
Looking at waitpid() I found out that it returns an object with the exitCode. I changed my code so that I just perform certain actions based on what the value of the exitCode is.

Nodejs event handling

Following is my nodejs code
var emitter = require('events'),
eventEmitter = new emitter.EventEmitter();
eventEmitter.on('data', function (result) { console.log('Im From Data'); });
eventEmitter.on('error', function (result) { console.log('Im Error'); });
require('http').createServer(function (req, res) {
res.end('Response');
var start = new Date().getTime();
eventEmitter.emit('data', true);
eventEmitter.emit('error', false);
while(new Date().getTime() - start < 5000) {
//Let me sleep
}
process.nextTick(function () {
console.log('This is event loop');
});
}).listen(8090);
Nodejs is single threaded and it runs in an eventloop and the same thread serves the events.
So, in the above code on a request to my localhost:8090 node thread should be kept busy serving the request [there is a sleep for 5s].
At the same time there are two events being emitted by eventEmitter. So, both these events must be queued in the eventloop for processing once the request is served.
But that is not happening, I can see the events being served synchronously as they are emitted.
Is that expected? I understand that if it works as I expect then there would be no use of extending events module. But how are the events emitted by eventEmitter handled?
Only things that require asynchronous processing are pushed into the event loop. The standard event emitter in node will dispatch an event immediately. Only code using things like process.nextTick, setTimeout, setInterval, or code explicitly adding to it in C++ affect the event loop, like node's libraries.
For example, when you use node's fs library for something like createReadStream, it returns a stream, but opens the file in the background. When it is open, node adds to the event loop and when the function in the loop gets called, it will trigger the 'open' event on the stream object. Then, node will load blocks from the file in the background, and add to the event loop to trigger data events on the stream.
If you wanted those events to be emitted after 5 seconds, you'd want to use setTimeout or put the emit calls after your busy loop.
I'd also like to be clear, you should never have a busy loop like that in Node code. I can't tell if you were just doing it to test the event loop, or if it is part of some real code. If you need more info, please you expand on the functionality you are looking to achieve.

Resources