NodeJS capturing large data from child processes without piping

NodeJS capturing large data from child processes without piping - node.js

In Node JS's child process documentation. The following is mentioned.
By default, pipes for stdin, stdout, and stderr are established between the parent Node.js process and the spawned child. These pipes have limited (and platform-specific) capacity.If the child process writes to stdout in excess of that limit without the output being captured, the child process will block waiting for the pipe buffer to accept more data.
I've been trying to spawn (child_process.spawn) a console application from node. The application is interactive and outputs a lot of data on initiation. Apparently, when you pipe, if the process outputting data is getting a head of the process getting input, the OS pause the process outputting. For e.g, when you run ps ax | cat, if ps process is getting ahead of cat process, ps will pause until cat can accept input again. But it should eventually pipe all data.
In my case, this console application (written in C) seems to be pausing completely when this happens. I could be wrong but this seems to be the case because the piped output stops mid way no matter what I do. This happens even when I'm piping from bash, so there's no problem with nodejs here. May be this application is not very compatible with how piping works in linux.
There's nothing I can do regarding this console application but is there some other way in node to get these data chunks without piping?
Edited:
The main code
const js = spawn('julius', ['-C', 'julius.jconf', '-dnnconf', 'dnn.jconf']);
I tried
js.stdout.on("data", (data) => { console.log(`stdout: ${js.stdout}`); });
and
js.stdout.pipe(process.stdout);
to capture the output. Same result, the output is being cut off.

Related

Capturing secondary / tertiary stream to terminal / TTY

See picture below. I run a command from the terminal which starts process A, which starts a server (process B), and the server in turn will start workers (processes C). I want to stream the stdout/stderr of the server to some log file, but I want the stdout and stderr of the workers to stream back to the terminal. All processes here are Node.js processes.
No idea how to do this or if it's possible. My only guess as to how it might work is if the terminal session has some sort of handle or id which I can use and tell the worker processes to stream to that handle or id. I don't know enough about *nix to know how this works. Any explanation would help.
Here's a visual:

Using Node.js (since all the processes from the OP are Node.js processes), here is one solution I have discovered.
step 1, in process A, get the tty identity of the current
terminal/tty
const tty = String(cp.execSync('tty', {stdio:['inherit','pipe','pipe']})).trim();
step 2, pass the tty value from process A to process B => I pass that
dynamic value (a string) to the child process using socket.io (you
could also use IPC)
this.emit('tty-value', tty);
step 3 In the child process B, I use fd = fs.openSync(tty) to get the
right file descriptor.
const fd = fs.openSync(tty)
step 4 Then I can write to the terminal that I want to write to with the following
const strm = fs.createWriteStream(null, {fd: fd});
So when process B creates child process C, it can make the calls necessary to pipe the stdout/stderr from process C to the above stream.
...this took me all day to figure out, so leaving this here for anyone to see

Nice job figuring out the tty thing. Not sure if there is an easier way to do exactly that, but I have a few ideas that involve cheating on your question in a way but are probably better in the long run anyway.
Use a logging framework like winston or bunyan. Or a structured logging system like timequerylog. The logging frameworks will allow for multiple loggers, one can go to a file and another can go to stdout.
Use the Cluster built-in Node module to create workers and have the workers send messages to the master with events/data. The master could then log those stdout.

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

I'm trying to send a huge json string from a child process to the parent process. My initial approach was the following:
child:
process.stdout.write(myHugeJsonString);
parent:
child.stdout.on('data', function(data) { ...
But now I read that process.stdout is blocking:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
Ok, what about the Stream objectoption? Hmmmm, it seems I can share a readable or writable stream that refers to a socket with the child process. Would this be non-blocking? How would I implement that?
So the question stands: How do I send huge amounts of data from a child process to the parent process in a non-blocking way in Node.js? A cross-platform solution would be really neat, examples with explanation very appreciated.

One neat trick I used on *nix for this is the fifo pipes (http://linux.about.com/library/cmd/blcmdl4_fifo.htm). This allows child to write to a file like thing and the parent to read from the same. The file is not really on the fs so you don't get any IO problems, all access is handled by the kernel itself. But... if you want it cross-platform, that won't work. There's no such thing on Windows (as far as I know).
Just note that you define the size of the pipe and if what you write to it (from child) is not read by something else (from parent), then the child will block when the pipe is full. This does not block the node processes, they see the pipe as a normal file stream.

I had a similar problem and I think I have a good solution by setting-up a pipe when spawning the child process and using the resulting file descriptor to duplex data to the clients end.
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?
Apparently you can use fs to stream to/from file descriptors:
How to stream to/from a file descriptor in node?

The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
No. The docs above say stdout/stderr, and in no way do they say "all pipes".
It won't matter that stdout/stderr are blocking. In order for a pipe to block, it needs to fill up, which takes a lot of data. In order to fill up, the reader at the other end has to be reading slower than you are writing. But... you are the other end, you wrote the parent process. So, as long as your parent process is functioning, it should be reading from the pipes.
Generally, blocking of the child is a good thing. If its producing data faster than the parent can handle there are ultimately only two possibilities:
1. it blocks, so stops producing data until the parent catches up
2. it produces more data than the parent can consume, and buffers that data in local memory until it hits the v8 memory limit, and the process aborts
You can use stdout to send your json, if you want 1)
You can use a new 'pipe' to send your json, if you want 2)

Node ffmpeg child_process stops but still alive

I have an Node App that starts multiple instances of ffmpeg through child_process, each ffmpeg transcodes a live stream from a camera.
The problem is that after 5~10 min the ffmpeg process just stops doing the transcoding, the process is still alive since I can see it on the tasks manager, but it justs stops doing the transcoding.
Now if I send the output of ffmpeg to the Node.js console log, that actually keeps the transcoding alive.
Any Ideas what might be causing this?

It seems like the stdout or stderr buffers are filling up and the code blocks until they are read.
The solution is to redirect them to the parent, write them to a file, pipe them somewhere else, or ignore them (pipe them to /dev/null).
The simplist way, if you don't need the output, is to spawn with:
spawn('prg', [], { stdio: 'ignore' });
This is documented in the spawn documentation.

Process connected to separate pty for stdout and stderr

I'm writing a terminal logging program - think the script command but a bit more featureful. One of the differences is that, whereas script captures stdout, stdin and stderr as one big character stream, I would like to keep them separate and record them as such.
In order to do this, I use the standard approach of running a child shell connected to a pty, but instead of using a single pty with stdin, stdout and stderr all connected to it, I use two ptys - with stdin and stderr connected to one pty, and stdout on the other. This way, the master process can tell what is coming from stdout and what from stderr.
This has, so far, worked fine. However, I'm starting to run into a few issues. For example, when trying to set the number of columns, I get the following:
$stty cols 169
stty: stdout appears redirected, but stdin is the control descriptor
This seems to be a result of this piece of code, which seems to check whether stdout and stderr are both ttys, but complains if they are not the same.
My question, therefore, is this: am I violating any fundamental assumptions about how Posix processes behave by acting in this way? If not, any idea why I'm seeing errors such as this? If so, is there any way I can get around this and still manage to separate stdout and stderr nicely?

One idea I had about this is to use a process directly on the pty which then runs the target program, e.g.
(wrapper) -> pty -> (controller) -> script
The controller would be responsible for running the script and capturing the stdout and stderr separately, feeding them back to the wrapper, perhaps by some non-std fd, or alternatively, serialising the data before shipping it back, e.g. prefixing output from stderr with stderr: and stdout with stdout: - then in the wrapper deserialize this and feed it back upstream or whatever you want to do with it.

What happens to stdout when a script runs a program?

I have an embedded application that I want a simple-minded logger for.
The system starts from a script file, which in turn runs the application. There could be various reasons that the script fails to run the application, or the application itself could fail to start. To diagnose this remotely, I need to view the stdout from the script and the application.
I tried writing a tee-like logger that would repeat its stdin to stdout, and save the text in a FIFO for later retrieval via the network. Then I naively tried
./script | ./logger
I ended up with only the script stdout going to the logger, and the application stdout disappearing. I had similar results trying tee.
The system is running kernel 2.4.26, and busybox.
What is going on, and how can I accomplish my desired ends?

It turns out it was working exactly as I thought it should work, with one minor gotcha. stdout was being buffered, and without any fflush(stdout) commands, I never saw it. Had I been really patient, I would have suddenly seen a big gush of output when the stdout buffer filled up. A call to setlinebuf(3) fixed my problem.

Apparently, the application output doesn't end up on stdout...
The output is actually on stderr (which is usually also connected to the terminal)
./script.sh 2>&1 | ./logger
should then work
The application actively disconnects from stdin/stdout (e.g. by closing/reopening file descriptors 0,1(,2) or, using nohup, exec or similar utilities)
the script daemonizes (which also detaches from all standard streams)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

NodeJS capturing large data from child processes without piping - node.js

Related

Capturing secondary / tertiary stream to terminal / TTY

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

Node ffmpeg child_process stops but still alive

Process connected to separate pty for stdout and stderr

What happens to stdout when a script runs a program?

Categories

Resources