How to write to stdout without being blocking under linux?

How to write to stdout without being blocking under linux? - linux

I've written a log-to-stdout program which produces logs, and another exe read-from-stdin (for example filebeat) to collect logs from stdin. My problem is that my log-to-stdout speed may burst in a short period which exceeds read-from-stdin can accept, that will blocking log-to-stdout process, I'd like to know if there is a Linux API to tell if the stdout file descriptor can be written to (up to N bytes) without being blocked?
I've found some comments in nodejs process.stdout
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
Does that mean under Linux it's impossible to do non-blocking write on stdout? Some documents reference non-blocking file operate mode (https://www.linuxtoday.com/blog/blocking-and-non-blocking-i-0/), does it apply to stdout too? Because I'm using third-party logging (which expect stdout working at blocking mode), can I check stdout writable in non-blocking mode (before calling logging library), and then switch stdout back to blocking mode, so from logging library perspective, stdout fd still works as previously? (if I can tell stdout will be blocking, I'll throw output, since not being block is more important than output complete logging in my usage)
(Or if there is a auto-drop-pipe command, which can auto drop lines if pipeline will block, so I can call
log-to-stdout | auto-drop-pipe --max-lines=100 --drop-head-if-full | read-from-stdin)

Related

Node.js pipe console error to another program (make it async)

From Expressjs documentation:
To keep your app purely asynchronous, you’d still want to pipe
console.err() to another program
Qestions:
Is it enough to run my node app with stdout and stderr redirect to not block event loop? Like this: node app 2>&1 | tee logFile ?
If ad.1 answer is true, then how to achieve non-blocking logging while using Winston or Bunyan? They have some built in mechanism to achieve this or they just save data to specific file wasting cpu time of current Node.js process? Or maybe to achieve trully async logging they should pipe data to child process that performs "save to file" (is it still performance positive?) ? Can anyone explain or correct me if my way of thinking is just wrong?
Edited part: I can assume that piping data from processes A, B, ...etc to process L is cheaper for this specific processes (A, B, ...) than writing it to file (or sending over network).
To the point:
I am designing logger for application that uses nodejs cluster.
Briefly - one of processes (L) will handle data streams from others, (A, B, ...).
Process L will queue messages (for example line by line or some other special separator) and log it one by one into file, db or anywhere else.
Advantage of this approach is reducing load of processes that can spent more time on doing their job.
One more thing - assumption is to simplify usage of this library so user will only include this logger without any additional interaction (stream redirection) via shell.
Do you think this solution makes sense? Maybe you know a library that already doing this?

Let's set up some ground level first...
Writing to a terminal screen (console.log() etc.), writing to a file (fs.writeFile(), fs.writeFileSync() etc.) or sending data to a stream process.stdout.write(data) etc.) will always "block the event loop". Why? Because some part of those functions is always written in JavaScript. The minimum amount of work needed by these functions would be to take the input and hand it over to some native code, but some JS will always be executed.
And since JS is involved, it will inevitably "block" the event loop because JavaScript code is always executed on a single thread no matter what.
Is this a bad thing...?
No. The amount of time required to process some log data and send it over to a file or a stream is quite low and does not have significant impact on performance.
When would this be a bad thing, then...?
You can hurt your application by doing something generally called a "synchronous" I/O operation - that is, writing to a file and actually not executing any other JavaScript code until that write has finished. When you do this, you hand all the data to the underlying native code and while theoretically being able to continue doing other work in JS space, you intentionally decide to wait until the native code responds back to you with the results. And that will "block" your event loop, because these I/O operations can take much much longer than executing regular code (disks/networks tend to be the slowest part of a computer).
Now, let's get back to writing to stdout/stderr.
From Node.js' docs:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
I am assuming we are working with POSIX systems below.
In practice, this means that when your Node.js' output streams are not piped and are sent directly to the TTY, writing something to the console will block the event loop until the whole chunk of data is sent to the screen. However, if we redirect the output streams to something else (a process, a file etc.) now when we write something to the console Node.js will not wait for the completion of the operation and continue executing other JavaScript code while it writes the data to that output stream.
In practice, we get to execute more JavaScript in the same time period.
With this information you should be able to answer all your questions yourself now:
You do not need to redirect the stdout/stderr of your Node.js process if you do not write anything to the console, or you can redirect only one of the streams if you do not write anything to the other one. You may redirect them anyway, but if you do not use them you will not gain any performance benefit.
If you configure your logger to write the log data to a stream then it will not block your event loop too much (unless some heavy processing is involved).
If you care this much about your app's performance, do not use Winston or Bunyan for logging - they are extremely slow. Use pino instead - see the benchmarks in their readme.

To answer (1) we can dive into the Express documentation, you will see a link to the Node.js documentation for Console, which links to the Node documentation on the process I/O. There it describes how process.stdout and process.stderr behaves:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
With that we can try to understand what will happen with node app 2>&1 | tee logFile:
Stdout and stderr is piped to a process tee
tee writes to both the terminal and the file logFile.
The important part here is that stdout and stderr is piped to a process, which means that it should be asynchronous.
Regarding (2) it would depend on how you configured Bunyan or Winston:
Winston has the concept of Transports, which essentially allows you to configure where the log will go. If you want asynchronous logs, you should use any logger other than the Console Transport. Using the File Transport should be ok, as it should create a file stream object for this and that is asynchronous, and won't block the Node process.
Bunyan has a similar configuration option: Streams. According to their doc, it can accept any stream interface. As long as you avoid using the process.stdout and process.stderr streams here you should be ok.

stdout collisions on node

If multiple processes are simultaneously writing to stdout then there is nothing to stop the streams from interleaving. This is what I mean by a collision.
According to the comments in the node source, it should be possible to avoid collisions in process.stdout. I tried this and it helps but I still get collisions. Without the writing flag, I get collisions every time, with the flag it drops to about 40%. Still very significant.
page.on('onConsoleMessage', function log(message) {
var writing = process.stdout._writableState.writing
|| process.stdout._writableState.bufferProcessing
|| process.stdout.bufferSize ;
if(writing)
process.nextTick(message => log(message));
else
process.stdout.write('-> ' + message + '\n')
});
What is the best way to avoid collisions on process.stdout?
The above routine is competing with Winston for stdout.
node v5.12.0
Windows 10
This problem only happens when using the Run console in webstorm, the output is not mixed up when running node in powershell or from cmd. I raised a ticket for this at jetbrains.

If you have multiple writers to the same stream, I don't think that you'll get interleaving. Even if a single writer is logging multiple lines in succession, those lines will be buffered (if there is buffering going on) in the correct order, and when another writer is logging its lines, they will be appended to the buffer, behind the previously logged lines.
Interleaving can occur when you have writers writing to a different stream, like one writing to stdout and the other to stderr. At some point, when the output buffer of one stream fills up, it gets flushed to the console, regardless of any other streams that may also be writing to console.

Determine when(/after which input) a python subprocess crashes

I have a python subprocess that runs an arbitrary C++ program (student assignments if it matters) via POpen. The structure is such that i write a series of inputs to stdin, at the end i read all of stdout and parse for responses to each output.
Of course given that these are student assignments, they may crash after certain inputs. What i require is to know after which specific input their program crashed.
So far i know that when a runtime exception is thrown in the C++ program, its printed to stderr. So right not i can read the stderr after the fact and see that it did in face crash. But i haven't found a way to read stderr while the program is still running, so that i can infer that the error is in response to the latest input. Every SO question or article that i have run into seems to make use of subprocess.communicate(), but communicate seems to block until the subprocess returns, this hasn't been working for me because i need to continue sending inputs to the program after the fact if it hasn't crashed.

What i require is to know after which specific input their program crashed.
Call process.stdin.flush() after process.stdin.write(b'your input'). If the process is already dead then either .write() or .flush() will raise an exception (specific exception may depend on the system e.g, BrokenPipeError on POSIX).
Unrelated: If you are redirecting all three standard streams (stdin=PIPE, stdout=PIPE, stderr=PIPE) then make sure to consume stdout, stderr pipes concurrently while you are writing the input otherwise the child process may hang if it generates enough output to fill the OS pipe buffer. You could use threads, async. IO to do it -- code examples.

Process connected to separate pty for stdout and stderr

I'm writing a terminal logging program - think the script command but a bit more featureful. One of the differences is that, whereas script captures stdout, stdin and stderr as one big character stream, I would like to keep them separate and record them as such.
In order to do this, I use the standard approach of running a child shell connected to a pty, but instead of using a single pty with stdin, stdout and stderr all connected to it, I use two ptys - with stdin and stderr connected to one pty, and stdout on the other. This way, the master process can tell what is coming from stdout and what from stderr.
This has, so far, worked fine. However, I'm starting to run into a few issues. For example, when trying to set the number of columns, I get the following:
$stty cols 169
stty: stdout appears redirected, but stdin is the control descriptor
This seems to be a result of this piece of code, which seems to check whether stdout and stderr are both ttys, but complains if they are not the same.
My question, therefore, is this: am I violating any fundamental assumptions about how Posix processes behave by acting in this way? If not, any idea why I'm seeing errors such as this? If so, is there any way I can get around this and still manage to separate stdout and stderr nicely?

One idea I had about this is to use a process directly on the pty which then runs the target program, e.g.
(wrapper) -> pty -> (controller) -> script
The controller would be responsible for running the script and capturing the stdout and stderr separately, feeding them back to the wrapper, perhaps by some non-std fd, or alternatively, serialising the data before shipping it back, e.g. prefixing output from stderr with stderr: and stdout with stdout: - then in the wrapper deserialize this and feed it back upstream or whatever you want to do with it.

What happens to stdout when a script runs a program?

I have an embedded application that I want a simple-minded logger for.
The system starts from a script file, which in turn runs the application. There could be various reasons that the script fails to run the application, or the application itself could fail to start. To diagnose this remotely, I need to view the stdout from the script and the application.
I tried writing a tee-like logger that would repeat its stdin to stdout, and save the text in a FIFO for later retrieval via the network. Then I naively tried
./script | ./logger
I ended up with only the script stdout going to the logger, and the application stdout disappearing. I had similar results trying tee.
The system is running kernel 2.4.26, and busybox.
What is going on, and how can I accomplish my desired ends?

It turns out it was working exactly as I thought it should work, with one minor gotcha. stdout was being buffered, and without any fflush(stdout) commands, I never saw it. Had I been really patient, I would have suddenly seen a big gush of output when the stdout buffer filled up. A call to setlinebuf(3) fixed my problem.

Apparently, the application output doesn't end up on stdout...
The output is actually on stderr (which is usually also connected to the terminal)
./script.sh 2>&1 | ./logger
should then work
The application actively disconnects from stdin/stdout (e.g. by closing/reopening file descriptors 0,1(,2) or, using nohup, exec or similar utilities)
the script daemonizes (which also detaches from all standard streams)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string