Node.js pipe console error to another program (make it async) - node.js

From Expressjs documentation:
To keep your app purely asynchronous, you’d still want to pipe
console.err() to another program
Qestions:
Is it enough to run my node app with stdout and stderr redirect to not block event loop? Like this: node app 2>&1 | tee logFile ?
If ad.1 answer is true, then how to achieve non-blocking logging while using Winston or Bunyan? They have some built in mechanism to achieve this or they just save data to specific file wasting cpu time of current Node.js process? Or maybe to achieve trully async logging they should pipe data to child process that performs "save to file" (is it still performance positive?) ? Can anyone explain or correct me if my way of thinking is just wrong?
Edited part: I can assume that piping data from processes A, B, ...etc to process L is cheaper for this specific processes (A, B, ...) than writing it to file (or sending over network).
To the point:
I am designing logger for application that uses nodejs cluster.
Briefly - one of processes (L) will handle data streams from others, (A, B, ...).
Process L will queue messages (for example line by line or some other special separator) and log it one by one into file, db or anywhere else.
Advantage of this approach is reducing load of processes that can spent more time on doing their job.
One more thing - assumption is to simplify usage of this library so user will only include this logger without any additional interaction (stream redirection) via shell.
Do you think this solution makes sense? Maybe you know a library that already doing this?

Let's set up some ground level first...
Writing to a terminal screen (console.log() etc.), writing to a file (fs.writeFile(), fs.writeFileSync() etc.) or sending data to a stream process.stdout.write(data) etc.) will always "block the event loop". Why? Because some part of those functions is always written in JavaScript. The minimum amount of work needed by these functions would be to take the input and hand it over to some native code, but some JS will always be executed.
And since JS is involved, it will inevitably "block" the event loop because JavaScript code is always executed on a single thread no matter what.
Is this a bad thing...?
No. The amount of time required to process some log data and send it over to a file or a stream is quite low and does not have significant impact on performance.
When would this be a bad thing, then...?
You can hurt your application by doing something generally called a "synchronous" I/O operation - that is, writing to a file and actually not executing any other JavaScript code until that write has finished. When you do this, you hand all the data to the underlying native code and while theoretically being able to continue doing other work in JS space, you intentionally decide to wait until the native code responds back to you with the results. And that will "block" your event loop, because these I/O operations can take much much longer than executing regular code (disks/networks tend to be the slowest part of a computer).
Now, let's get back to writing to stdout/stderr.
From Node.js' docs:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
I am assuming we are working with POSIX systems below.
In practice, this means that when your Node.js' output streams are not piped and are sent directly to the TTY, writing something to the console will block the event loop until the whole chunk of data is sent to the screen. However, if we redirect the output streams to something else (a process, a file etc.) now when we write something to the console Node.js will not wait for the completion of the operation and continue executing other JavaScript code while it writes the data to that output stream.
In practice, we get to execute more JavaScript in the same time period.
With this information you should be able to answer all your questions yourself now:
You do not need to redirect the stdout/stderr of your Node.js process if you do not write anything to the console, or you can redirect only one of the streams if you do not write anything to the other one. You may redirect them anyway, but if you do not use them you will not gain any performance benefit.
If you configure your logger to write the log data to a stream then it will not block your event loop too much (unless some heavy processing is involved).
If you care this much about your app's performance, do not use Winston or Bunyan for logging - they are extremely slow. Use pino instead - see the benchmarks in their readme.

To answer (1) we can dive into the Express documentation, you will see a link to the Node.js documentation for Console, which links to the Node documentation on the process I/O. There it describes how process.stdout and process.stderr behaves:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
With that we can try to understand what will happen with node app 2>&1 | tee logFile:
Stdout and stderr is piped to a process tee
tee writes to both the terminal and the file logFile.
The important part here is that stdout and stderr is piped to a process, which means that it should be asynchronous.
Regarding (2) it would depend on how you configured Bunyan or Winston:
Winston has the concept of Transports, which essentially allows you to configure where the log will go. If you want asynchronous logs, you should use any logger other than the Console Transport. Using the File Transport should be ok, as it should create a file stream object for this and that is asynchronous, and won't block the Node process.
Bunyan has a similar configuration option: Streams. According to their doc, it can accept any stream interface. As long as you avoid using the process.stdout and process.stderr streams here you should be ok.

Related

How to write to stdout without being blocking under linux?

I've written a log-to-stdout program which produces logs, and another exe read-from-stdin (for example filebeat) to collect logs from stdin. My problem is that my log-to-stdout speed may burst in a short period which exceeds read-from-stdin can accept, that will blocking log-to-stdout process, I'd like to know if there is a Linux API to tell if the stdout file descriptor can be written to (up to N bytes) without being blocked?
I've found some comments in nodejs process.stdout
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
Does that mean under Linux it's impossible to do non-blocking write on stdout? Some documents reference non-blocking file operate mode (https://www.linuxtoday.com/blog/blocking-and-non-blocking-i-0/), does it apply to stdout too? Because I'm using third-party logging (which expect stdout working at blocking mode), can I check stdout writable in non-blocking mode (before calling logging library), and then switch stdout back to blocking mode, so from logging library perspective, stdout fd still works as previously? (if I can tell stdout will be blocking, I'll throw output, since not being block is more important than output complete logging in my usage)
(Or if there is a auto-drop-pipe command, which can auto drop lines if pipeline will block, so I can call
log-to-stdout | auto-drop-pipe --max-lines=100 --drop-head-if-full | read-from-stdin)

atomically appending to a log file in nodejs

NodeJS is asynchronous, so for example if running an Express server, one might be in the middle of servicing one request, log it, and then start servicing another request and try to log it before the first has finished.
Since these are log files, it's not a simple write. Even if a write was atomic, maybe another process actually winds up writing at the offset the original process is about to and it winds up overwriting.
There is a synchronous append function (fs.appendFile) but this would require us to delay servicing a request to wait for a log file write to complete and I'm still not sure if that guarantees an atomic append. What is the best practice for writing to log files in NodeJS while ensuring atomicitiy?
one might be in the middle of servicing one request, log it, and then start servicing another request and try to log it before the first has finished.
The individual write calls will be atomic, so as long as you make a single log write call per request, you won't have any corruption of log messages. It is normal, however if you log multiple messages while processing a request, for those to be interleaved between many different concurrent requests. Each message is intact, but they are in the log file in chronological order, not grouped by request. That is fine. You can filter on a request UUID if you want to follow a single request in isolation.
Even if a write was atomic, maybe another process actually winds up writing at the offset the original process is about to and it winds up overwriting.
Don't allow multiple processes to write to the same file or log. Use process.stdout and all will be fine. Or if you really want to log directly to the filesystem, use an exclusive lock mechanism.
What is the best practice for writing to log files in NodeJS while ensuring atomicitiy?
process.stdout, one write call per coherent log message. You can let your process supervisor (systemd or upstart) write your logs for you, or use a log manager such as multilog, sysvlogd and pipe your stdout to them and let them handle writing to disk.

Transparent fork-server on linux

Suppose I have application A that takes some time to load (opens a couple of libraries). A processes stdin into some stdout.
I want to serve A on a network over a socket (instead of stdin and stdout).
The simplest way of doing that efficiently that I can think of is by hacking at the code and adding a forking server loop, replacing stdin and stdout with socket input and output.
The performance improvement compared to having an independent server application that spawns A (fork+exec) on each connection comes at a cost however. The latter is much easier to write and I don't need to have access to the source code of A or know the language it's written in.
I want my cake and eat it too. Is there a mechanism that would extract that forking loop?
What I want is something like fast_spawnp("A", "/tmp/A.pid", stdin_fd, stdout_fd, stderr_fd) (start process A unless it's already running, clone A from outside and make sure the standard streams of the child point to the argument-supplied file descriptors).

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

I'm trying to send a huge json string from a child process to the parent process. My initial approach was the following:
child:
process.stdout.write(myHugeJsonString);
parent:
child.stdout.on('data', function(data) { ...
But now I read that process.stdout is blocking:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
Ok, what about the Stream objectoption? Hmmmm, it seems I can share a readable or writable stream that refers to a socket with the child process. Would this be non-blocking? How would I implement that?
So the question stands: How do I send huge amounts of data from a child process to the parent process in a non-blocking way in Node.js? A cross-platform solution would be really neat, examples with explanation very appreciated.
One neat trick I used on *nix for this is the fifo pipes (http://linux.about.com/library/cmd/blcmdl4_fifo.htm). This allows child to write to a file like thing and the parent to read from the same. The file is not really on the fs so you don't get any IO problems, all access is handled by the kernel itself. But... if you want it cross-platform, that won't work. There's no such thing on Windows (as far as I know).
Just note that you define the size of the pipe and if what you write to it (from child) is not read by something else (from parent), then the child will block when the pipe is full. This does not block the node processes, they see the pipe as a normal file stream.
I had a similar problem and I think I have a good solution by setting-up a pipe when spawning the child process and using the resulting file descriptor to duplex data to the clients end.
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?
Apparently you can use fs to stream to/from file descriptors:
How to stream to/from a file descriptor in node?
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
No. The docs above say stdout/stderr, and in no way do they say "all pipes".
It won't matter that stdout/stderr are blocking. In order for a pipe to block, it needs to fill up, which takes a lot of data. In order to fill up, the reader at the other end has to be reading slower than you are writing. But... you are the other end, you wrote the parent process. So, as long as your parent process is functioning, it should be reading from the pipes.
Generally, blocking of the child is a good thing. If its producing data faster than the parent can handle there are ultimately only two possibilities:
1. it blocks, so stops producing data until the parent catches up
2. it produces more data than the parent can consume, and buffers that data in local memory until it hits the v8 memory limit, and the process aborts
You can use stdout to send your json, if you want 1)
You can use a new 'pipe' to send your json, if you want 2)

Resources