Node detect child_process waiting to read from stdin

Node detect child_process waiting to read from stdin - node.js

I've found some questions about writing to child_process standard input, such as Nodejs Child Process: write to stdin from an already initialised process, however, I'm wondering if it is possible to recognize when a process spawned using Node's child_process attempts to read from its standard input and take action on that (perhaps according to what it has written to its standard output up until then).
I see that the stdio streams are implemented using Stream in Node. Stream has an event called data which is for when it is being written into, however, I see no event for detecting the stream is being read from.
Is the way to go here to subclass Stream and override its read method with custom implementation or is there a simpler way?

I've played around with Node standard I/O and streams for a bit until I eventually arrived at a solution. You can find it here: https://github.com/TomasHubelbauer/node-stdio
The gist of it is that we need to create a Readable stream and pipe it to the process' standard input. Then we need to listen for the process' standard output and parse it, detect the chunks of interest (prompts to the user) and each time we get one of those, make our Readable output our "reaction" to the prompt to the process' standard input.
Start the process:
const cp = child_process.exec('node test');
Prepare a Readable and pipe it to the process' standard input:
new stream.Readable({ read }).pipe(cp.stdin);
Provide the read implementation which will be called when the process asks for input:
/** #this {stream.Readable} */
async function read(/** #type {number} */ size) {
this.push(await promise + '\n');
}
Here the promise is used to block until we have an answer to the question the process asked through its standard output. this.push will add the answer to an internal queue of the Readable and eventually it will be sent to the standard input of the process.
An example of how to parse the input for a program prompt, derive an answer from the question, wait for the answer to be provided and then send it to the process is in the linked repository.

Related

Does the .pipe() perform a memcpy in node.js?

This is a conceptual query regarding system level optimisation. My understanding by reading the NodeJS Documentation is that pipes are handy to perform flow control on streams.
Background: I have microphone stream coming in and I wanted to avoid an extra copy operation to conserve overall system MIPS. I understand that for audio streams this is not a great deal of MIPS being spent even if there was a memcopy under the hood, but I also have an extension planned to stream in camera frames at 30fps and UHD resolution. Making multiple copies of UHD resolution pixel data at 30fps is super inefficient, so needed some advice around this.
Example Code:
var spawn = require('child_process').spawn
var PassThrough = require('stream').PassThrough;
var ps = null;
//var audioStream = new PassThrough;
//var infoStream = new PassThrough;
var start = function() {
if(ps == null) {
ps = spawn('rec', ['-b', 16, '--endian', 'little', '-c', 1, '-r', 16000, '-e', 'signed-integer', '-t', 'raw', '-']);
//ps.stdout.pipe(audioStream);
//ps.stderr.pipe(infoStream);
exports.audioStream = ps.stdout;
exports.infoStream = ps.stderr;
}
};
var stop = function() {
if(ps) {
ps.kill();
ps = null;
}
};
//exports.audioStream = audioStream;
//exports.infoStream = infoStream;
exports.startCapture = start;
exports.stopCapture = stop;
Here are the questions:
To be able to perform flow control, does the source.pipe(dest) perform a memcpy from the source memory to the destination memory under the hood OR would it pass the reference in memory to the destination?
The commented code contains a PassThrough class instantiation - I am currently assuming the PassThrough causes memcopies as well, and so I am saving one memcpy operation in the entire system because I added in the above comments?
If I had to create a pipe between a Process and a Spawned Child process (using child_process.spawn() as shown in How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?), I presume that definitely results in memcpy? Is there anyway to make that a reference rather than copy?
Does this behaviour differ from OS to OS? I presume it should be OS agnostic, but asking this anyways.
Thanks in advance for your help. It will help my architecture a great deal.

some url's for reference: https://github.com/nodejs/node/
https://github.com/nodejs/node/blob/master/src/stream_wrap.cc
https://github.com/nodejs/node/blob/master/src/stream_base.cc
https://github.com/libuv/libuv/blob/v1.x/src/unix/stream.c
https://github.com/libuv/libuv/blob/v1.x/src/win/stream.c
i tried writing a complicated / huge explaination based on theese and some other files however i came to the conclusion it would be best to give you a summary of how my experience / reading tells me node internally works:
pipe simply connects streams making it appear as if .on("data", …) is called by .write(…) without anything bloated in between.
now we need to separate the js world from the c++ / c world.
when dealing with data in js we use buffers. https://github.com/nodejs/node/blob/master/src/node_buffer.cc
they simply represent allocated memory with some candy on top to operate with it.
if you connect stdout of a process to some .on("data", …) listener it will copy the incoming chunk into a Buffer object for further usage inside the js world.
inside the js world you have methods like .pause() etc. (as you can see in nodes steam api documentation) to prevent the process to eat memory in case incoming data flows faster than its processed.
connecting stdout of a process and for example an outgoing tcp port through pipe will result in a connection similar to how nginx operates. it will connect theese streams as if they would directly talk to each other by copying incoming data directly to the outgoing stream.
as soon as you pause a stream, node will use internal buffering in case its unable to pause the incoming stream.
so for your scenario you should just do testing.
try to receive data through an incoming stream in node, pause the stream and see what happens.
i'm not sure if node will use internal buffering or if the process you try to run will just halt untill it can continue to send data.
i expect the process to halt untill you continue the stream.
for transfering huge images i recommend transfering them in chunks or to pipe them directly to an outgoing port.
the chunk way would allow you to send the data to multiple clients at once and would keep the memory footprint pretty low.
PS you should take a look at this gist that i just found: https://gist.github.com/joyrexus/10026630
it explains in depth how you can interact with streams

Are the extra stdio streams in node.js child_process.spawn blocking?

When creating a child process using spawn() you can pass options to create multiple streams via the options.stdio argument. after the standard 3 (stdin, stdout, stderr) you can pass extra streams and pipes, which will be file descriptor in the child process. Then you can use a fs.createRead/WriteStream to access those.
See http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
var opts = {
stdio: [process.stdin, process.stdout, process.stderr, 'pipe']
};
var child = child_process.spawn('node', ['./child.js'], opts);
But the docs are not really clear on where these pipes are blocking. I know stdin/stdout/stderr are blocking, but what about the 'pipe''s?
In one part they say:
"Please note that the send() method on both the parent and child are
synchronous - sending large chunks of data is not advised (pipes can
be used instead, see child_process.spawn"
But elsewhere they say:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
Can anybody clarify this? Are pipes blocking on Linux?
I need to transfer large amounts of data without blocking my worker processes.
Related:
How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?

How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?

I have a bunch of (child)processes in node.js that need to transfer large amounts of data.
When I read the manual it says the the stdio and ipc inferface between them are blocking, so that won't do.
I'm looking into using file descriptors but I cannot find a way to stream from them (see my other more specific question How to stream to/from a file descriptor in node?)
I think I might use a net socket, but I fear that has unwanted overhead.
I also see this but it not the same (and has no answers: How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?)

I found a solution that seems to work: when spawning the child process you can pass options for stdio and setup a pipe to stream data.
The trick is to add an additional element, and set it to 'pipe'.
In the parent process stream to child.stdio[3].
var opts = {
stdio: [process.stdin, process.stdout, process.stderr, 'pipe']
};
var child = child_process.spawn('node', ['./child.js'], opts);
// send data
mySource.pipe(child.stdio[3]);
//read data
child.stdio[3].pipe(myHandler);
In de child open stream for file descriptor 3.
// read from it
var readable = fs.createReadStream(null, {fd: 3});
// write to it
var writable = fs.createWriteStream(null, {fd: 3});
Note that not every stream you get from npm works correctly, I tried JSONStream.stringify() but it created errors, but it worked after I piped it via through2. (no idea why that is).
Edit: some observations: it seems the pipe is not always Duplex stream, so you might need two pipes. And there is something weird going on where in one case it only works if I also have a ipc channel, so 6 total: [stdin, stdout, stderr, pipe, pipe, ipc].

Reading stdout of child process unbuffered

I'm trying to read the output of a Python script launched by Node.js as it arrives. However, I only get access to the data once the process has finished.
var proc, args;
args = [
'./bin/build_map.py',
'--min_lon',
opts.sw.lng,
'--max_lon',
opts.ne.lng,
'--min_lat',
opts.sw.lat,
'--max_lat',
opts.ne.lat,
'--city',
opts.city
];
proc = spawn('python', args);
proc.stdout.on('data', function (buf) {
console.log(buf.toString());
socket.emit('map-creation-response', buf.toString());
});
If I launch the process with { stdio : 'inherit' } I can see the output as it happens directly in the console. But doing something like process.stdout.on('data', ...) will not work.
How do I make sure I can read the output from the child process as it arrives and direct it somewhere else?

The process doing the buffering, because it knows the terminal was redirected and not really going to the terminal, is python. You can easily tell Python not to do this buffering: Just run "python -u" instead of "python". Should be easy as that.

When a process is spawned by child_process.spawn(), the streams connected to the child process's standard output and standard error are actually unbuffered on the Nodejs side. To illustrate this, consider the following program:
const spawn = require('child_process').spawn;
var proc = spawn('bash', [
'-c',
'for i in $(seq 1 80); do echo -n .; sleep 1; done'
]);
proc.stdout
.on('data', function (b) {
process.stdout.write(b);
})
.on('close', function () {
process.stdout.write("\n");
});
This program runs bash and has it emit . characters every second for 80 seconds, while consuming this child process's standard output via data events. You should notice that the dots are emitted by the Node program every second, helping to confirm that buffering does not occur on the Nodejs side.
Also, as explained in the Nodejs documentation on child_process:
By default, pipes for stdin, stdout and stderr are established between
the parent Node.js process and the spawned child. It is possible to
stream data through these pipes in a non-blocking way. Note, however,
that some programs use line-buffered I/O internally. While that does
not affect Node.js, it can mean that data sent to the child process
may not be immediately consumed.
You may want to confirm that your Python program does not buffer its output. If you feel you're emitting data from your Python program as separate distinct writes to standard output, consider running sys.stdout.flush() following each write to suggest that Python should actually write data instead of trying to buffer it.
Update: In this commit that passage from the Nodejs documentation was removed for the following reason:
doc: remove confusing note about child process stdio
It’s not obvious what the paragraph is supposed to say. In particular,
whether and what kind of buffering mechanism a process uses for its
stdio streams does not affect that, in general, no guarantees can be
made about when it consumes data that was sent to it.
This suggests that there could be buffering at play before the Nodejs process receives data. In spite of this, care should be taken to ensure that processes within your control upstream of Nodejs are not buffering their output.

Linux: is there a way to use named fifos on the writer side in non-blocking mode?

I've found many questions and answers about pipes on Linux, but almost all discuss the reader side.
For a process that shall be ready to deliver data to a named pipe as soon as the data is available and a reading process is connected, is there a way to, in a non-blocking fashion:
wait (poll(2)) for reader to open the pipe,
wait in a loop (again poll(2)) for signal that writing to the pipe will not block, and
when such signal is received, check how many bytes may be written to the pipe without blocking
I understand how to do (2.), but I wasn't able to find consistent answers for (1.) and (3.).
EDIT: I was looking for (something like) FIONWRITE for pipes, but Linux does not have FIONWRITE (for pipes) (?)
EDIT2: The intended main loop for the writer (kind of pseudo code, target language is C/C++):
forever
poll(can_read_command, can_write_to_the_fifo)
if (can_read_command) {
read and parse command
update internal status
continue
}
if (can_write_to_the_fifo) {
length = min(data_available, space_for_nonblocking_write)
write(output_fifo, buffer, length)
update internal status
continue
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string