Flush/drain stdout/stderr in Node.js process before exiting - node.js

Is there a fireproof way to guarantee that all the stdout and stderr in a Node.js process has made its way to the destination (in as much is possible) before allowing a Node.js process to completely terminate?
On some occasions I have consistently seen stdout and stderr fail to make it out of the node.js process before terminating.
What I am thinking is:
process.once('exit', function(){
// use some synchronous call to flush or drain stdout and stderr
});
anyone found a surefire way to do this? Mostly I haven't seen this problem occur, but I am just looking for any possible way that I can add some extra insurance to this.

Related

How can a forked node process send data to a terminal or to the parent on exit?

I am dealing with an odd problem which I couldn't find the answer to online, nor through a lot of trial and error.
In a multi-multi process cluster, forked worker processes can run arbitrarily long commands, but the parent process listens for keepalive messages sent by workers, and kills workers that are stuck for longer than X seconds.
Worker processes can asynchronously communicate with the rest of the world (using http, or process.send ipc communication), but on exit, I'd like to be able to communicate some things (typically, queued logs or error details).
Most online documentation for process.on('exit', handler) indicates usage of console.log, however it seems like forked processes don't inherit a normal stdout, and the console.log isn't a direct tty, it's a stream (the ipc stream, I presume?).
Because of this, the process exit handler doesn't let me use console.log to log extra lines (or if it does, I'm not sure where these lines end up)
I tried various combinations of fork options (silent/not silent, non-default stdio options like inherit), using fs.write to write to tty or a real file, using process.send, or but in no case, was I able to get the on-exit handler to log anywhere visible.
How can I get the forked process to successfully log on exit?
small additional points - all this testing is on unix-like systems (macos , amazon linux...) and both parent and child processes are fired with --sigint-trace so that we can get at least the top 10 stack frames of the interrupted process on exit. These frames do make it out to the terminal successfully
This was a bit of a misunderstanding about how SIGINT is handled, and I believe that it's impossible to accomplish what I want here, but I'd love to hear if someone else found a solution.
Node has its own SIGINT handler which is "more powerful" than custom SIGINT handlers - typically it interrupts infinite loops, which is extremely useful in the case where code is blocked by long-running operations.
Node allows one-upping its own SIGINT debugging capabilities by attaching a --trace-sigint flag which captures the last frames of execution.
If I understood this correctly, there are 4 cases with different behavior
No custom handler, event loop blocked
process is terminated without any further code execution. (and --trace-sigint can give a few stack traces)
No custom handler, event loop not blocked
normal exit flow, process.on('exit') event fires.
Custom handler, event loop blocked
nothing happens until event loop unblocks (if it does), then normal exit flow
Custom handler, event loop not blocked
normal exit flow.
This happens regardless of the way the process is started, and it's not a problem about pipes or exit events - in the case where the event loop is blocked and the native signal handler is in place, the process terminates without any further execution.
It would seem like there is no way to both get a forced process exit during a blocked event loop, AND still get node code to run on the same process after the native interruption to recover more information.
Given this, I believe the best way to recover information from the stuck process is to stream data out of it before it freezes (sounds obvious, but brings a lot of extra considerations in production environments).

Node.js pipe console error to another program (make it async)

From Expressjs documentation:
To keep your app purely asynchronous, you’d still want to pipe
console.err() to another program
Qestions:
Is it enough to run my node app with stdout and stderr redirect to not block event loop? Like this: node app 2>&1 | tee logFile ?
If ad.1 answer is true, then how to achieve non-blocking logging while using Winston or Bunyan? They have some built in mechanism to achieve this or they just save data to specific file wasting cpu time of current Node.js process? Or maybe to achieve trully async logging they should pipe data to child process that performs "save to file" (is it still performance positive?) ? Can anyone explain or correct me if my way of thinking is just wrong?
Edited part: I can assume that piping data from processes A, B, ...etc to process L is cheaper for this specific processes (A, B, ...) than writing it to file (or sending over network).
To the point:
I am designing logger for application that uses nodejs cluster.
Briefly - one of processes (L) will handle data streams from others, (A, B, ...).
Process L will queue messages (for example line by line or some other special separator) and log it one by one into file, db or anywhere else.
Advantage of this approach is reducing load of processes that can spent more time on doing their job.
One more thing - assumption is to simplify usage of this library so user will only include this logger without any additional interaction (stream redirection) via shell.
Do you think this solution makes sense? Maybe you know a library that already doing this?
Let's set up some ground level first...
Writing to a terminal screen (console.log() etc.), writing to a file (fs.writeFile(), fs.writeFileSync() etc.) or sending data to a stream process.stdout.write(data) etc.) will always "block the event loop". Why? Because some part of those functions is always written in JavaScript. The minimum amount of work needed by these functions would be to take the input and hand it over to some native code, but some JS will always be executed.
And since JS is involved, it will inevitably "block" the event loop because JavaScript code is always executed on a single thread no matter what.
Is this a bad thing...?
No. The amount of time required to process some log data and send it over to a file or a stream is quite low and does not have significant impact on performance.
When would this be a bad thing, then...?
You can hurt your application by doing something generally called a "synchronous" I/O operation - that is, writing to a file and actually not executing any other JavaScript code until that write has finished. When you do this, you hand all the data to the underlying native code and while theoretically being able to continue doing other work in JS space, you intentionally decide to wait until the native code responds back to you with the results. And that will "block" your event loop, because these I/O operations can take much much longer than executing regular code (disks/networks tend to be the slowest part of a computer).
Now, let's get back to writing to stdout/stderr.
From Node.js' docs:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
I am assuming we are working with POSIX systems below.
In practice, this means that when your Node.js' output streams are not piped and are sent directly to the TTY, writing something to the console will block the event loop until the whole chunk of data is sent to the screen. However, if we redirect the output streams to something else (a process, a file etc.) now when we write something to the console Node.js will not wait for the completion of the operation and continue executing other JavaScript code while it writes the data to that output stream.
In practice, we get to execute more JavaScript in the same time period.
With this information you should be able to answer all your questions yourself now:
You do not need to redirect the stdout/stderr of your Node.js process if you do not write anything to the console, or you can redirect only one of the streams if you do not write anything to the other one. You may redirect them anyway, but if you do not use them you will not gain any performance benefit.
If you configure your logger to write the log data to a stream then it will not block your event loop too much (unless some heavy processing is involved).
If you care this much about your app's performance, do not use Winston or Bunyan for logging - they are extremely slow. Use pino instead - see the benchmarks in their readme.
To answer (1) we can dive into the Express documentation, you will see a link to the Node.js documentation for Console, which links to the Node documentation on the process I/O. There it describes how process.stdout and process.stderr behaves:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
With that we can try to understand what will happen with node app 2>&1 | tee logFile:
Stdout and stderr is piped to a process tee
tee writes to both the terminal and the file logFile.
The important part here is that stdout and stderr is piped to a process, which means that it should be asynchronous.
Regarding (2) it would depend on how you configured Bunyan or Winston:
Winston has the concept of Transports, which essentially allows you to configure where the log will go. If you want asynchronous logs, you should use any logger other than the Console Transport. Using the File Transport should be ok, as it should create a file stream object for this and that is asynchronous, and won't block the Node process.
Bunyan has a similar configuration option: Streams. According to their doc, it can accept any stream interface. As long as you avoid using the process.stdout and process.stderr streams here you should be ok.

How to send huge amounts of data from child process to parent process in a non-blocking way in Node.js?

I'm trying to send a huge json string from a child process to the parent process. My initial approach was the following:
child:
process.stdout.write(myHugeJsonString);
parent:
child.stdout.on('data', function(data) { ...
But now I read that process.stdout is blocking:
process.stderr and process.stdout are unlike other streams in Node in
that writes to them are usually blocking.
They are blocking in the case that they refer to regular files or TTY file descriptors.
In the case they refer to pipes:
They are blocking in Linux/Unix.
They are non-blocking like other streams in Windows.
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
Ok, what about the Stream objectoption? Hmmmm, it seems I can share a readable or writable stream that refers to a socket with the child process. Would this be non-blocking? How would I implement that?
So the question stands: How do I send huge amounts of data from a child process to the parent process in a non-blocking way in Node.js? A cross-platform solution would be really neat, examples with explanation very appreciated.
One neat trick I used on *nix for this is the fifo pipes (http://linux.about.com/library/cmd/blcmdl4_fifo.htm). This allows child to write to a file like thing and the parent to read from the same. The file is not really on the fs so you don't get any IO problems, all access is handled by the kernel itself. But... if you want it cross-platform, that won't work. There's no such thing on Windows (as far as I know).
Just note that you define the size of the pipe and if what you write to it (from child) is not read by something else (from parent), then the child will block when the pipe is full. This does not block the node processes, they see the pipe as a normal file stream.
I had a similar problem and I think I have a good solution by setting-up a pipe when spawning the child process and using the resulting file descriptor to duplex data to the clients end.
How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?
Apparently you can use fs to stream to/from file descriptors:
How to stream to/from a file descriptor in node?
The documentation for child_process.spawn says I can create a pipe between the child process and the parent process using the pipe option. But isn't piping my stdout blocking in Linux/Unix (according to cited docs above)?
No. The docs above say stdout/stderr, and in no way do they say "all pipes".
It won't matter that stdout/stderr are blocking. In order for a pipe to block, it needs to fill up, which takes a lot of data. In order to fill up, the reader at the other end has to be reading slower than you are writing. But... you are the other end, you wrote the parent process. So, as long as your parent process is functioning, it should be reading from the pipes.
Generally, blocking of the child is a good thing. If its producing data faster than the parent can handle there are ultimately only two possibilities:
1. it blocks, so stops producing data until the parent catches up
2. it produces more data than the parent can consume, and buffers that data in local memory until it hits the v8 memory limit, and the process aborts
You can use stdout to send your json, if you want 1)
You can use a new 'pipe' to send your json, if you want 2)

What's the right way to write to file from Node.js to avoid bottlenecking?

I'm curious what the correct methodology is to write to a log file from a process that might be called dozens (or maybe even thousands) of times simultaneously.
I have a node process which is called via http and I wish to log from it, but I don't want it to bottleneck as it attempts to open/write/close the same file from all the various simultaneous requests.
I've read that stderr might be the answer to this problem, but am curious what makes that approach any less bottlenecky. At the end of the day, if stderr is going to some central location, isn't it going to have the exact same problem?
Best practice for node (e.g. http://12factor.net/) is to write to stdout or stderr. The expectation is that the OS will handle the file management / throughput that you want, or else you can have a custom-written log collector that can do it the way you want and redirect stdout or stderr to it.

What's the most efficient way to prevent a node.js script from terminating?

If I'm writing something simple and want it to run until explicitly terminated, is there a best practice to prevent script termination without causing blocking, using CPU time or preventing callbacks from working?
I'm assuming at that point I'd need some kind of event loop implementation or a way to unblock the execution of events that come in from other async handlers (network io, message queues)?
A specific example might be something along the lines of "I want my node script to sleep until a job is available via Beanstalkd".
I think the relevant counter-question is "How are you checking for the exit condition?".
If you're polling a web service, then the underlying setInterval() for the poll will keep it alive until cancelled. If you're taking in input from a stream, that should keep it alive until the stream closes, etc.
Basically, you must be monitoring something in order to know whether or not you should exit. That monitoring should be the thing keeping the script alive.
Node.js end when it have nothing else to do.
If you listen on a port, it have something to do, and a way to receive beanstalk command, so it will wait.
Create a function that close the port and you ll have your explicit exit, but it will wait for all current job to end before closing.

Resources