Capturing secondary / tertiary stream to terminal / TTY - node.js

See picture below. I run a command from the terminal which starts process A, which starts a server (process B), and the server in turn will start workers (processes C). I want to stream the stdout/stderr of the server to some log file, but I want the stdout and stderr of the workers to stream back to the terminal. All processes here are Node.js processes.
No idea how to do this or if it's possible. My only guess as to how it might work is if the terminal session has some sort of handle or id which I can use and tell the worker processes to stream to that handle or id. I don't know enough about *nix to know how this works. Any explanation would help.
Here's a visual:

Using Node.js (since all the processes from the OP are Node.js processes), here is one solution I have discovered.
step 1, in process A, get the tty identity of the current
terminal/tty
const tty = String(cp.execSync('tty', {stdio:['inherit','pipe','pipe']})).trim();
step 2, pass the tty value from process A to process B => I pass that
dynamic value (a string) to the child process using socket.io (you
could also use IPC)
this.emit('tty-value', tty);
step 3 In the child process B, I use fd = fs.openSync(tty) to get the
right file descriptor.
const fd = fs.openSync(tty)
step 4 Then I can write to the terminal that I want to write to with the following
const strm = fs.createWriteStream(null, {fd: fd});
So when process B creates child process C, it can make the calls necessary to pipe the stdout/stderr from process C to the above stream.
...this took me all day to figure out, so leaving this here for anyone to see

Nice job figuring out the tty thing. Not sure if there is an easier way to do exactly that, but I have a few ideas that involve cheating on your question in a way but are probably better in the long run anyway.
Use a logging framework like winston or bunyan. Or a structured logging system like timequerylog. The logging frameworks will allow for multiple loggers, one can go to a file and another can go to stdout.
Use the Cluster built-in Node module to create workers and have the workers send messages to the master with events/data. The master could then log those stdout.

Related

NodeJS capturing large data from child processes without piping

In Node JS's child process documentation. The following is mentioned.
By default, pipes for stdin, stdout, and stderr are established between the parent Node.js process and the spawned child. These pipes have limited (and platform-specific) capacity.If the child process writes to stdout in excess of that limit without the output being captured, the child process will block waiting for the pipe buffer to accept more data.
I've been trying to spawn (child_process.spawn) a console application from node. The application is interactive and outputs a lot of data on initiation. Apparently, when you pipe, if the process outputting data is getting a head of the process getting input, the OS pause the process outputting. For e.g, when you run ps ax | cat, if ps process is getting ahead of cat process, ps will pause until cat can accept input again. But it should eventually pipe all data.
In my case, this console application (written in C) seems to be pausing completely when this happens. I could be wrong but this seems to be the case because the piped output stops mid way no matter what I do. This happens even when I'm piping from bash, so there's no problem with nodejs here. May be this application is not very compatible with how piping works in linux.
There's nothing I can do regarding this console application but is there some other way in node to get these data chunks without piping?
Edited:
The main code
const js = spawn('julius', ['-C', 'julius.jconf', '-dnnconf', 'dnn.jconf']);
I tried
js.stdout.on("data", (data) => { console.log(`stdout: ${js.stdout}`); });
and
js.stdout.pipe(process.stdout);
to capture the output. Same result, the output is being cut off.

Node.js pipe console error to another program (make it async)

From Expressjs documentation:
To keep your app purely asynchronous, you’d still want to pipe
console.err() to another program
Qestions:
Is it enough to run my node app with stdout and stderr redirect to not block event loop? Like this: node app 2>&1 | tee logFile ?
If ad.1 answer is true, then how to achieve non-blocking logging while using Winston or Bunyan? They have some built in mechanism to achieve this or they just save data to specific file wasting cpu time of current Node.js process? Or maybe to achieve trully async logging they should pipe data to child process that performs "save to file" (is it still performance positive?) ? Can anyone explain or correct me if my way of thinking is just wrong?
Edited part: I can assume that piping data from processes A, B, ...etc to process L is cheaper for this specific processes (A, B, ...) than writing it to file (or sending over network).
To the point:
I am designing logger for application that uses nodejs cluster.
Briefly - one of processes (L) will handle data streams from others, (A, B, ...).
Process L will queue messages (for example line by line or some other special separator) and log it one by one into file, db or anywhere else.
Advantage of this approach is reducing load of processes that can spent more time on doing their job.
One more thing - assumption is to simplify usage of this library so user will only include this logger without any additional interaction (stream redirection) via shell.
Do you think this solution makes sense? Maybe you know a library that already doing this?
Let's set up some ground level first...
Writing to a terminal screen (console.log() etc.), writing to a file (fs.writeFile(), fs.writeFileSync() etc.) or sending data to a stream process.stdout.write(data) etc.) will always "block the event loop". Why? Because some part of those functions is always written in JavaScript. The minimum amount of work needed by these functions would be to take the input and hand it over to some native code, but some JS will always be executed.
And since JS is involved, it will inevitably "block" the event loop because JavaScript code is always executed on a single thread no matter what.
Is this a bad thing...?
No. The amount of time required to process some log data and send it over to a file or a stream is quite low and does not have significant impact on performance.
When would this be a bad thing, then...?
You can hurt your application by doing something generally called a "synchronous" I/O operation - that is, writing to a file and actually not executing any other JavaScript code until that write has finished. When you do this, you hand all the data to the underlying native code and while theoretically being able to continue doing other work in JS space, you intentionally decide to wait until the native code responds back to you with the results. And that will "block" your event loop, because these I/O operations can take much much longer than executing regular code (disks/networks tend to be the slowest part of a computer).
Now, let's get back to writing to stdout/stderr.
From Node.js' docs:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
I am assuming we are working with POSIX systems below.
In practice, this means that when your Node.js' output streams are not piped and are sent directly to the TTY, writing something to the console will block the event loop until the whole chunk of data is sent to the screen. However, if we redirect the output streams to something else (a process, a file etc.) now when we write something to the console Node.js will not wait for the completion of the operation and continue executing other JavaScript code while it writes the data to that output stream.
In practice, we get to execute more JavaScript in the same time period.
With this information you should be able to answer all your questions yourself now:
You do not need to redirect the stdout/stderr of your Node.js process if you do not write anything to the console, or you can redirect only one of the streams if you do not write anything to the other one. You may redirect them anyway, but if you do not use them you will not gain any performance benefit.
If you configure your logger to write the log data to a stream then it will not block your event loop too much (unless some heavy processing is involved).
If you care this much about your app's performance, do not use Winston or Bunyan for logging - they are extremely slow. Use pino instead - see the benchmarks in their readme.
To answer (1) we can dive into the Express documentation, you will see a link to the Node.js documentation for Console, which links to the Node documentation on the process I/O. There it describes how process.stdout and process.stderr behaves:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
With that we can try to understand what will happen with node app 2>&1 | tee logFile:
Stdout and stderr is piped to a process tee
tee writes to both the terminal and the file logFile.
The important part here is that stdout and stderr is piped to a process, which means that it should be asynchronous.
Regarding (2) it would depend on how you configured Bunyan or Winston:
Winston has the concept of Transports, which essentially allows you to configure where the log will go. If you want asynchronous logs, you should use any logger other than the Console Transport. Using the File Transport should be ok, as it should create a file stream object for this and that is asynchronous, and won't block the Node process.
Bunyan has a similar configuration option: Streams. According to their doc, it can accept any stream interface. As long as you avoid using the process.stdout and process.stderr streams here you should be ok.

Looking for an example on how to pass file descriptors with either socket.io or IPC

I launch a Node.js process A with a terminal.
Process A launches process B with child_process.spawn.
In turn, process B launches worker processes, those are all the same type and let's call them process(es) C.
I want the C processes to write to the original terminal, but I want the B process to write to a log file.
In order to accomplish this, my current belief is that I have to pass the file descriptor representing the current terminal to process B using IPC or maybe socket.io.
I am looking for examples on how to pass file descriptors with IPC/socket.io but coming up empty-handed.
I really am looking for two pieces on info:
(a) how to get a file descriptor that represents the current terminal
(at it's most basic those fd's are simply the integers 0,1,2 for
stdin,stdout, and stderr, but I don't think those will work in my
case).
(b) I am looking for a code example on how to pass an fd with IPC in
Node.js. (Socket.io would work just as well, if that's possible).
From my brief research, it looks like file descriptors are just integers, so they can be passed with JSON, like so:
JSON.stringify({fd: 18});
and you can pass this data with IPC in Node.js, or socket.io, or whatever.
Although, my research also says that just because you have an integer in hand that "represents a file descriptor", that doesn't give you much guarantees.
More info:
If you run the 'tty' command at the terminal, like so:
$ tty
you will get something like this:
/dev/ttys001
then in Node.js, if you do
const fd = fs.openSync('/dev/ttys001','a');
then you will get the file descriptor for the tty, and that fd should be an integer.
You can use that info to write to the tty, like so:
const fd = fs.openSync('/dev/ttys001','a');
const stream = fs.createWriteStream(null,{fd:fd});
process.stdout.pipe(stream);
process.stderr.pipe(stream);
it took me awhile to figure this out, so maybe it will help you.

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

I'm trying to send a huge json string from a child process to the parent process. My initial approach was the following:
child:
process.stdout.write(myHugeJsonString);
parent:
child.stdout.on('data', function(data) { ...
But now I read that process.stdout is blocking:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
Ok, what about the Stream objectoption? Hmmmm, it seems I can share a readable or writable stream that refers to a socket with the child process. Would this be non-blocking? How would I implement that?
So the question stands: How do I send huge amounts of data from a child process to the parent process in a non-blocking way in Node.js? A cross-platform solution would be really neat, examples with explanation very appreciated.
One neat trick I used on *nix for this is the fifo pipes (http://linux.about.com/library/cmd/blcmdl4_fifo.htm). This allows child to write to a file like thing and the parent to read from the same. The file is not really on the fs so you don't get any IO problems, all access is handled by the kernel itself. But... if you want it cross-platform, that won't work. There's no such thing on Windows (as far as I know).
Just note that you define the size of the pipe and if what you write to it (from child) is not read by something else (from parent), then the child will block when the pipe is full. This does not block the node processes, they see the pipe as a normal file stream.
I had a similar problem and I think I have a good solution by setting-up a pipe when spawning the child process and using the resulting file descriptor to duplex data to the clients end.
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?
Apparently you can use fs to stream to/from file descriptors:
How to stream to/from a file descriptor in node?
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
No. The docs above say stdout/stderr, and in no way do they say "all pipes".
It won't matter that stdout/stderr are blocking. In order for a pipe to block, it needs to fill up, which takes a lot of data. In order to fill up, the reader at the other end has to be reading slower than you are writing. But... you are the other end, you wrote the parent process. So, as long as your parent process is functioning, it should be reading from the pipes.
Generally, blocking of the child is a good thing. If its producing data faster than the parent can handle there are ultimately only two possibilities:
1. it blocks, so stops producing data until the parent catches up
2. it produces more data than the parent can consume, and buffers that data in local memory until it hits the v8 memory limit, and the process aborts
You can use stdout to send your json, if you want 1)
You can use a new 'pipe' to send your json, if you want 2)

Process connected to separate pty for stdout and stderr

I'm writing a terminal logging program - think the script command but a bit more featureful. One of the differences is that, whereas script captures stdout, stdin and stderr as one big character stream, I would like to keep them separate and record them as such.
In order to do this, I use the standard approach of running a child shell connected to a pty, but instead of using a single pty with stdin, stdout and stderr all connected to it, I use two ptys - with stdin and stderr connected to one pty, and stdout on the other. This way, the master process can tell what is coming from stdout and what from stderr.
This has, so far, worked fine. However, I'm starting to run into a few issues. For example, when trying to set the number of columns, I get the following:
$stty cols 169
stty: stdout appears redirected, but stdin is the control descriptor
This seems to be a result of this piece of code, which seems to check whether stdout and stderr are both ttys, but complains if they are not the same.
My question, therefore, is this: am I violating any fundamental assumptions about how Posix processes behave by acting in this way? If not, any idea why I'm seeing errors such as this? If so, is there any way I can get around this and still manage to separate stdout and stderr nicely?
One idea I had about this is to use a process directly on the pty which then runs the target program, e.g.
(wrapper) -> pty -> (controller) -> script
The controller would be responsible for running the script and capturing the stdout and stderr separately, feeding them back to the wrapper, perhaps by some non-std fd, or alternatively, serialising the data before shipping it back, e.g. prefixing output from stderr with stderr: and stdout with stdout: - then in the wrapper deserialize this and feed it back upstream or whatever you want to do with it.

Resources