node.js does not flush buffers on a crash - node.js

I am running node as a windows service. The service was crashing on startup so I implemented a logging system, to discover that messages do not get written to a file when the application is forced to exit. I have been able to duplicate the problem with the code below:
var fs = require('fs');
var logStream = fs.createWriteStream('./nx3.log');
logStream.end('Goodbye world');
process.exit(0);
Nothing is written into nx3.log because the buffers don't flush. I have been able to work around the problem by using fs.appendFileSync but I would prefer to be using a mature logging module rather than rolling my own.
Is it possible to open up a write stream that is unbuffered? Or some other way around this?

The issue here is not related to the buffer. The FS writeableStream performs the writes asynchronously. So process.exit is not waiting for logSteam.end to perform the write, rather it is exiting immediately.
What you can do, is listen on the uncaughtException event, and perform your logging there. If a listener is added for this exception, the default action (which is to print a stack trace and exit) will not occur.
process.on('uncaughtException', function (err) {
logStream.end(err, function() {
// write has completed, now we can exit
process.exit(0);
});
});

Watch out, the callback of end() is listening to finish event which DOES NOT implies buffer have been flushed into the output (e.g. disk or network).
So, end() will
emit finish
flush data
emit end
So, listen to end event from stream to be sure, like this example:
process.on('uncaughtException', function (err) {
// Use logStream to log what you want
logStream.write(err);
// Exit process only after flushing data
logStream.writeStream.once('end', () => process.exit(1));
logStream.end();
});
Source: https://nodejs.org/api/stream.html#stream_events_finish_and_end

Related

when does Node spawned child process actually start?

In the documentation for Node's Child Process spawn() function, and in examples I've seen elsewhere, the pattern is to call the spawn() function, and then to set up a bunch of handlers on the returned ChildProcess object. For instance, here is the first example of spawn() given on that documentation page:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
The spawn() function itself is called on the second line. My understanding is that spawn() starts a child process asynchronously. From the documentation:
The child_process.spawn() method spawns a new process using the given
command, with command line arguments in args.
However, the following lines of the script above go on to set up various handlers for the process, so it's assuming that the process hasn't actually started (and potentially finished) between the time spawn() is called on line 2 and the other stuff happens on the subsequent lines. I know JavaScript/Node is single threaded. However, the operating system is not single threaded, and naively one would read that spawn() call to be telling the operating system to spawn the process right now (at which point, with unfortunate timing, the OS could suspend the parent Node process and run/complete the child process before the next line of the Node code is executed).
But it must be that the process doesn't actually get spawned until the current JavaScript function completes (or more generally the current JavaScript event handler that called the current function completes), right?
That seems like a pretty important thing to say. Why doesn't it say that in the Child Process documentation page? Is there some overriding Node principle that makes it unnecessary to say that explicitly?
The spawning of the new process starts immediately (it's handed over to the OS to actually fire up the process and get it going). Starting the new process with .spawn() is asynchronous and non-blocking. So, it will initiate the operation with the OS and immediately return. You might think that that's why it's OK to set up event handlers after it returns (because the process hasn't yet finished starting). Well, yes and no. It likely hasn't yet finished starting the new process, but that isn't the main reason why it's OK.
It's OK, because node.js runs all its events through a single threaded event queue. Thus no events from the newly spawned process can be processed until after your code finishes executing and returns control back to the system. Only then can it process the next event in the event queue and trigger one of the events you are registering handlers for.
Or, said another way, none of the events from the other process are pre-emptive. They won't/can't interrupt your existing Javascript code. So, since you're still running your Javascript code, those events can't get run yet. Instead, they sit in the event queue until your Javascript code finishes and then the interpreter can go get the next event from the event queue and run the callback associated with it. Likewise, that callback runs until it returns back to the interpreter and then the interpreter can get the next event and run its callback and so on...
That's why node.js is called an event-driven system.
As such, it's perfectly fine to do this type of structure:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
None of those data or close events can execute their callbacks until after your code is done and returns control back to the system. So, it's perfectly safe to set up those event handlers like you are. Even if the newly spawned process was running and generating events right away, those events will just sit in the event queue until your Javascript finishes what it is doing (which includes setting up your event handlers).
Now, if you delayed setting up the event handlers until some future tick of the event loop (as shown below) with something like a setTimeout(), then you could miss some events:
const { spawn } = require('child_process');
const ls = spawn('ls', ['-lh', '/usr']);
setTimeout(() => {
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
}, 10);
Here you are not setting up the event handlers immediately as part of the same tick of the event loop, but after a short delay. Therefore some events could get processed from the event loop before you install your event handlers and you could miss some of these events. Obviously, you would never do it this way (on purpose), but I just wanted to show that code running on the same tick of the event loop does not have a problem, but code running on some future tick of the event loop could have a problem missing events.
This is to follow up on jfriend00's answer, to explain what it helped me understand, in case it helps someone else. I knew about the event-driven nature of JavaScript/Node. What jfriend00's explanation made clear to me is the idea that an event can happen and Node can be aware that it happened, but it doesn't actually decide which handlers to tell about that event until the next tick. For instance, if the spawn() call fails outright (e.g., command does not exist), Node obviously knows that immediately. My thought was that it would then immediately queue the appropriate handlers to run on the next tick. But what I now understand is that it puts the "raw event" (i.e., the fact that the spawn failed, with whatever details about that) in its queue, and then on the next tick it determines and calls the appropriate handlers. And the same is true for other events like receiving output from the process, etc. The event is saved but the appropriate handlers for the event are only determined when the next tick runs, so handlers assigned on the previous tick, after spawn(), will get called.

Handling multiple post requests with locking

So I have to write some NodeJS code that does the following: whenever a post request is made, I attempt to execute some program; if the program is already executing (because of a previous request), I ignore the request. If not, I execute the program. I'm using NodeJS child_process.exec to accomplish this; however, there's no way for me to know when exec(program) terminates; I thought of using execSync, but this simply blocks any requests until the program is done executing, instead of ignoring them completely. Here is the code I have right now:
function fun () {
execFile('C:\\Windows\\System32\\notepad.exe', ['package.json'],);
}
execFile is an EventEmitter, so you can listen for events that occur while execFile operates, including the exit event, which tells you the process has completed.
ignoreNextRequest = true;
execFile('C:\\Windows\\System32\\notepad.exe', ['package.json']).once('exit', (code, signal) => {
// Your code to handle the end of the process here.
ignoreNextRequest = false;
});

How to ensure we listen to a child process' events before they occur?

Here is some node.js code that spawns a linux ls command and prompts its result
const spawn = require('child_process').spawn;
const ls = spawn('ls', ['-l']);
let content = "";
ls.stdout.on('data', function(chunk){
content += chunk.toString();
});
ls.stdout.on('end', function(){
console.log(content);
});
This works well. However, the ls command is launched asynchronously, completely separated from the main nodeJs thread. My concern is that the data and end events on the process' stdout may have occurred before I attached event listeners.
Is there a way to attach event listeners before starting that sub-process ?
Note : I don't think I can wrap a Promise around the spawn function to make this work, as it would rely on events to be properly catched to trigger success/failure (leading back to the problem)
There is no problem here.
Readable streams (since node v0.10) have a (limited) internal buffer that stores data until you read from the stream. If the internal buffer fills up, the backpressure mechanism will kick in, causing the stream to stop reading data from its source.
Once you call .read() or add a data event handler, the internal buffer will start to drain and will then start reading from its source again.

Node.js: Will node always wait for setTimeout() to complete before exiting?

Consider:
node -e "setTimeout(function() {console.log('abc'); }, 2000);"
This will actually wait for the timeout to fire before the program exits.
I am basically wondering if this means that node is intended to wait for all timeouts to complete before quitting.
Here is my situation. My client has a node.js server he's gonna run from Windows with a Shortcut icon. If the node app encounters an exceptional condition, it will typically instantly exit, not leaving enough time to see in the console what the error was, and this is bad.
My approach is to wrap the entire program with a try catch, so now it looks like this: try { (function () { ... })(); } catch (e) { console.log("EXCEPTION CAUGHT:", e); }, but of course this will also cause the program to immediately exit.
So at this point I want to leave about 10 seconds for the user to take a peek or screenshot of the exception before it quits.
I figure I should just use blocking sleep() through the npm module, but I discovered in testing that setting a timeout also seems to work. (i.e. why bother with a module if something builtin works?) I guess the significance of this isn't big, but I'm just curious about whether it is specified somewhere that node will actually wait for all timeouts to complete before quitting, so that I can feel safe doing this.
In general, node will wait for all timeouts to fire before quitting normally. Calling process.exit() will exit before the timeouts.
The details are part of libuv, but the documentation makes a vague comment about it:
http://nodejs.org/api/all.html#all_ref
you can call ref() to explicitly request the timer hold the program open
Putting all of the facts together, setTimeout by default is designed to hold the event loop open (so if that's the only thing pending, the program will wait). You can programmatically disable or re-enable the behavior.
Late answer, but a definite yes - Nodejs will wait around for setTimeout to finish - see this documentation. Coincidentally, there is also a way to not wait around for setTimeout, and that is by calling unref on the object returned from setTimeout or setInterval.
To summarize: if you want Nodejs to wait until the timeout has been called, there's nothing you need to do. If you want Nodejs to not wait for a particular timeout, call unref on it.
If node didn't wait for all setTimeout or setInterval calls to complete, you wouldn't be able to use them in simple scripts.
Once you tell node to listen for an event, as with the setTimeout or some async I/O call, the event loop will loop until it is told to exit.
Rather than wrap everything in a try/catch you can bind an event listener to process just as the example in the docs:
process.on('uncaughtException', function(err) {
console.log('Caught exception: ' + err);
});
setTimeout(function() {
console.log('This will still run.');
}, 500);
// Intentionally cause an exception, but don't catch it.
nonexistentFunc();
console.log('This will not run.');
In the uncaughtException event, you can then add a setTimeout to exit after 10 seconds:
process.on('uncaughtException', function(err) {
console.log('Caught exception: ' + err);
setTimeout(function(){ process.exit(1); }, 10000);
});
If this exception is something you can recover from, you may want to look at domains: http://nodejs.org/api/domain.html
edit:
There may actually be another issue at hand: your client application doesn't do enough (or any?) logging. You can use log4js-node to write to a temp file or some application-specific location.
Easy way Solution:
Make a batch (.bat) file that starts nodejs
make a shortcut out of it
Why this is best. This way you client would run nodejs in command line. And even if nodejs program returns nothing would happen to command line.
Making bat file:
Make a text file
put START cmd.exe /k "node abc.js"
Save it
Rename It to abc.bat
make a shortcut or whatever.
Opening it will Open CommandLine and run nodejs file.
using settimeout for this is a bad idea.
The odd ones out are when you call process.exit() or there's an uncaught exception, as pointed out by Jim Schubert. Other than that, node will wait for the timeout to complete.
Node does remember timers, but only if it can keep track of them. At least that is my experience.
If you use setTimeout in an arrow / anonymous function I would recommend to keep track of your timers in an array, like:
=> {
timers.push(setTimeout(doThisLater, 2000));
}
and make sure let timers = []; isn't set in a method that will vanish, so i.e. globally.

Nodejs event handling

Following is my nodejs code
var emitter = require('events'),
eventEmitter = new emitter.EventEmitter();
eventEmitter.on('data', function (result) { console.log('Im From Data'); });
eventEmitter.on('error', function (result) { console.log('Im Error'); });
require('http').createServer(function (req, res) {
res.end('Response');
var start = new Date().getTime();
eventEmitter.emit('data', true);
eventEmitter.emit('error', false);
while(new Date().getTime() - start < 5000) {
//Let me sleep
}
process.nextTick(function () {
console.log('This is event loop');
});
}).listen(8090);
Nodejs is single threaded and it runs in an eventloop and the same thread serves the events.
So, in the above code on a request to my localhost:8090 node thread should be kept busy serving the request [there is a sleep for 5s].
At the same time there are two events being emitted by eventEmitter. So, both these events must be queued in the eventloop for processing once the request is served.
But that is not happening, I can see the events being served synchronously as they are emitted.
Is that expected? I understand that if it works as I expect then there would be no use of extending events module. But how are the events emitted by eventEmitter handled?
Only things that require asynchronous processing are pushed into the event loop. The standard event emitter in node will dispatch an event immediately. Only code using things like process.nextTick, setTimeout, setInterval, or code explicitly adding to it in C++ affect the event loop, like node's libraries.
For example, when you use node's fs library for something like createReadStream, it returns a stream, but opens the file in the background. When it is open, node adds to the event loop and when the function in the loop gets called, it will trigger the 'open' event on the stream object. Then, node will load blocks from the file in the background, and add to the event loop to trigger data events on the stream.
If you wanted those events to be emitted after 5 seconds, you'd want to use setTimeout or put the emit calls after your busy loop.
I'd also like to be clear, you should never have a busy loop like that in Node code. I can't tell if you were just doing it to test the event loop, or if it is part of some real code. If you need more info, please you expand on the functionality you are looking to achieve.

Resources