Related
Assume an Express route that makes a call to Mongoose and has to be async so it can await on the mongoose.find(). Also assume we are receiving XML but we have to change it to JSON, and that also needs to be async so I can call await inside of it.
If I do this:
app.post('/ams', async (req, res) => {
try {
xml2js.parseString(xml, async (err, json) => {
if (err) {
throw new XMLException();
}
// assume many more clauses here that can throw exceptions
res.status(200);
res.send("Data saved")
});
} catch(err) {
if (err instanceof XML2JSException) {
res.status(400);
message = "Malformed XML error: " + err;
res.send(message);
}
}
}
The server hangs forever. I'm assuming the async/await means that the server hits a timeout before something concludes.
If I put this:
res.status(200);
res.send("Data saved")
on the line before the catch(), then that is returned, but it is the only thing every returned. The client gets a 200, even if an XMLException is thrown.
I can see the XMLException throw in the console, but I cannot get a 400 to send back. I cannot get anything I that catch block to execute in a way that communicates the response to the client.
Is there a way to do this?
In a nutshell, there is no way to propagate an error from the xml2js.parseString() callback up to the higher code because that parent function has already exited and returned. This is how plain callbacks work with asynchronous code.
To understand the problem here, you have to follow the code flow for xml2js.parseString() in your function. If you instrumented it like this:
app.post('/ams', async (req, res) => {
try {
console.log("1");
xml2js.parseString(xml, async (err, json) => {
console.log("2");
if (err) {
throw new XMLException();
}
// assume many more clauses here that can throw exceptions
res.status(200);
res.send("Data saved")
});
console.log("3");
} catch (err) {
if (err instanceof XML2JSException) {
res.status(400);
message = "Malformed XML error: " + err;
res.send(message);
}
}
console.log("4");
});
Then, you would see this in the logs:
1 // about to call xml2js.parseString()
3 // after the call to xml2js.parseString()
4 // function about to exit
2 // callback called last after function returned
The outer function has finished and returned BEFORE your callback has been called. This is because xml2js.parseString() is asynchronous and non-blocking. That means that calling it just initiates the operation and then it immediately returns and the rest of your function continues to execute. It works in the background and some time later, it posts an event to the Javascript event queue and when the interpreter is done with whatever else it was doing, it will pick up that event and call the callback.
The callback will get called with an almost empty call stack. So, you can't use traditional try/catch exceptions with these plain, asynchronous callbacks. Instead, you must either handle the error inside the callback or call some function from within the callback to handle the error for you.
When you try to throw inside that plain, asynchronous callback, the exception just goes back into the event handler that triggered the completion of the asynchronous operation and no further because there's nothing else on the call stack. Your try/catch you show in your code cannot catch that exception. In fact, no other code can catch that exception - only code within the exception.
This is not a great way to write code, but nodejs survived with it for many years (by not using throw in these circumstances). However, this is why promises were invented and when used with the newer language features async/await, they provide a cleaner way to do things.
And, fortunately in this circumstance xml2js.parseString() has a promise interface already.
So, you can do this:
app.post('/ams', async (req, res) => {
try {
// get the xml data from somewhere
const json = await xml2js.parseString(xml);
// do something with json here
res.send("Data saved");
} catch (err) {
console.log(err);
res.status(400).send("Malformed XML error: " + err.message);
}
});
With the xml2js.parseString() interface, if you do NOT pass it a callback, it will return a promise instead that resolves to the final value or rejects with an error. This is not something all asynchronous interfaces can do, but is fairly common these days if the interface had the older style callback originally and then they want to now support promises. Newer interfaces are generally just built with only promise-based interfaces. Anyway, per the doc, this interface will return a promise if you don't pass a callback.
You can then use await with that promise that the function returns. If the promise resolves, the await will retrieve the resolved value of the promise. If the promise rejects, because you awaiting the rejection will be caught by the try/catch. FYI, you can also use .then() and .catch() with the promise, but in many cases, async and await are simpler so that's what I've shown here.
So, in this code, if there is invalid XML, then the promise that xml2js.parseString() returns will reject and control flow will go to the catch block where you can handle the error.
If you want to capture only the xml2js.parseString() error separately from other exceptions that could occur elsewhere in your code, you can put a try/catch around just it (though this code didn't show anything else that would likely throw an exception so I didn't add another try/catch). In fact, this form of try/catch can be used pretty much like you would normally use it with synchronous code. You can throw up to a higher level of try/catch too.
A few other notes, many people who first start programming with asynchronous operations try to just put await in front of anything asynchronous and hope that it solves their problem. await only does anything useful when you await a promise so your asynchronous function must return a promise that resolves/rejects when the asynchronous operation is complete for the await to do anything useful.
It is also possible to take a plain callback asynchronous function that does not have a promise interface and wrap a promise interface around it. You pretty much never want to mix promise interface functions with plain callback asynchronous operations because error handling and propagation is a nightmare with a mixed model. So, sometimes you have to "promisify" an older interface so you can use promises with it. In most cases, you can do that with util.promisify() built into the util library in nodejs. Fortunately, since promises and async/await are the modern and easier way to do asynchronous things, most newer asynchronous interfaces in the nodejs world come with promise interfaces already.
You are throwing exceptions inside the callback function. So you cant expect the catch block of the router to receive it.
One way to handle this is by using util.promisify.
try{
const util = require('util');
const parseString = util.promisify(xml2js.parseString);
let json = await parsestring(xml);
}catch(err)
{
...
}
What's the correct way to handle errors with streams? I already know there's an 'error' event you can listen on, but I want to know some more details about arbitrarily complicated situations.
For starters, what do you do when you want to do a simple pipe chain:
input.pipe(transformA).pipe(transformB).pipe(transformC)...
And how do you properly create one of those transforms so that errors are handled correctly?
More related questions:
when an error happens, what happens to the 'end' event? Does it never get fired? Does it sometimes get fired? Does it depend on the transform/stream? What are the standards here?
are there any mechanisms for propogating errors through the pipes?
do domains solve this problem effectively? Examples would be nice.
do errors that come out of 'error' events have stack traces? Sometimes? Never? is there a way to get one from them?
transform
Transform streams are both readable and writeable, and thus are really good 'middle' streams. For this reason, they are sometimes referred to as through streams. They are similar to a duplex stream in this way, except they provide a nice interface to manipulate the data rather than just sending it through. The purpose of a transform stream is to manipulate the data as it is piped through the stream. You may want to do some async calls, for example, or derive a couple of fields, remap some things, etc.
For how to create a transform stream see here and here. All you have to do is :
include the stream module
instantiate ( or inherit from) the Transform class
implement a _transform method which takes a (chunk, encoding, callback).
The chunk is your data. Most of the time you won't need to worry about encoding if you are working in objectMode = true. The callback is called when you are done processing the chunk. This chunk is then pushed on to the next stream.
If you want a nice helper module that will enable you to do through stream really really easily, I suggest through2.
For error handling, keep reading.
pipe
In a pipe chain, handling errors is indeed non-trivial. According to this thread .pipe() is not built to forward errors. So something like ...
var a = createStream();
a.pipe(b).pipe(c).on('error', function(e){handleError(e)});
... would only listen for errors on the stream c. If an error event was emitted on a, that would not be passed down and, in fact, would throw. To do this correctly:
var a = createStream();
a.on('error', function(e){handleError(e)})
.pipe(b)
.on('error', function(e){handleError(e)})
.pipe(c)
.on('error', function(e){handleError(e)});
Now, though the second way is more verbose, you can at least keep the context of where your errors happen. This is usually a good thing.
One library I find helpful though if you have a case where you only want to capture the errors at the destination and you don't care so much about where it happened is event-stream.
end
When an error event is fired, the end event will not be fired (explicitly). The emitting of an error event will end the stream.
domains
In my experience, domains work really well most of the time. If you have an unhandled error event (i.e. emitting an error on a stream without a listener), the server can crash. Now, as the above article points out, you can wrap the stream in a domain which should properly catch all errors.
var d = domain.create();
d.on('error', handleAllErrors);
d.run(function() {
fs.createReadStream(tarball)
.pipe(gzip.Gunzip())
.pipe(tar.Extract({ path: targetPath }))
.on('close', cb);
});
the above code sample is from this post
The beauty of domains is that they will preserve the stack traces. Though event-stream does a good job of this as well.
For further reading, check out the stream-handbook. Pretty in depth, but super useful and gives some great links to lots of helpful modules.
If you are using node >= v10.0.0 you can use stream.pipeline and stream.finished.
For example:
const { pipeline, finished } = require('stream');
pipeline(
input,
transformA,
transformB,
transformC,
(err) => {
if (err) {
console.error('Pipeline failed', err);
} else {
console.log('Pipeline succeeded');
}
});
finished(input, (err) => {
if (err) {
console.error('Stream failed', err);
} else {
console.log('Stream is done reading');
}
});
See this github PR for more discussion.
domains are deprecated. you dont need them.
for this question, distinctions between transform or writable are not so important.
mshell_lauren's answer is great, but as an alternative you can also explicitly listen for the error event on each stream you think might error. and reuse the handler function if you prefer.
var a = createReadableStream()
var b = anotherTypeOfStream()
var c = createWriteStream()
a.on('error', handler)
b.on('error', handler)
c.on('error', handler)
a.pipe(b).pipe(c)
function handler (err) { console.log(err) }
doing so prevents the infamous uncaught exception should one of those stream fire its error event
Errors from the whole chain can be propagated to the rightmost stream using a simple function:
function safePipe (readable, transforms) {
while (transforms.length > 0) {
var new_readable = transforms.shift();
readable.on("error", function(e) { new_readable.emit("error", e); });
readable.pipe(new_readable);
readable = new_readable;
}
return readable;
}
which can be used like:
safePipe(readable, [ transform1, transform2, ... ]);
.on("error", handler) only takes care of Stream errors but if you are using custom Transform streams, .on("error", handler) don't catch the errors happening inside _transform function. So one can do something like this for controlling application flow :-
this keyword in _transform function refers to Stream itself, which is an EventEmitter. So you can use try catch like below to catch the errors and later on pass them to the custom event handlers.
// CustomTransform.js
CustomTransformStream.prototype._transform = function (data, enc, done) {
var stream = this
try {
// Do your transform code
} catch (e) {
// Now based on the error type, with an if or switch statement
stream.emit("CTError1", e)
stream.emit("CTError2", e)
}
done()
}
// StreamImplementation.js
someReadStream
.pipe(CustomTransformStream)
.on("CTError1", function (e) { console.log(e) })
.on("CTError2", function (e) { /*Lets do something else*/ })
.pipe(someWriteStream)
This way, you can keep your logic and error handlers separate. Also , you can opt to handle only some errors and ignore others.
UPDATE
Alternative: RXJS Observable
Use multipipe package to combinate several streams into one duplex stream. And handle errors in one place.
const pipe = require('multipipe')
// pipe streams
const stream = pipe(streamA, streamB, streamC)
// centralized error handling
stream.on('error', fn)
Use Node.js pattern by creating a Transform stream mechanics and calling its callback done with an argument in order to propagate the error:
var transformStream1 = new stream.Transform(/*{objectMode: true}*/);
transformStream1.prototype._transform = function (chunk, encoding, done) {
//var stream = this;
try {
// Do your transform code
/* ... */
} catch (error) {
// nodejs style for propagating an error
return done(error);
}
// Here, everything went well
done();
}
// Let's use the transform stream, assuming `someReadStream`
// and `someWriteStream` have been defined before
someReadStream
.pipe(transformStream1)
.on('error', function (error) {
console.error('Error in transformStream1:');
console.error(error);
process.exit(-1);
})
.pipe(someWriteStream)
.on('close', function () {
console.log('OK.');
process.exit();
})
.on('error', function (error) {
console.error(error);
process.exit(-1);
});
const http = require('http');
const fs = require('fs');
const server = http.createServer();
server.on('request',(req,res)=>{
const readableStream = fs.createReadStream(__dirname+'/README.md');
const writeableStream = fs.createWriteStream(__dirname+'/assets/test.txt');
readableStream
.on('error',()=>{
res.end("File not found")
})
.pipe(writeableStream)
.on('error',(error)=>{
console.log(error)
res.end("Something went to wrong!")
})
.on('finish',()=>{
res.end("Done!")
})
})
server.listen(8000,()=>{
console.log("Server is running in 8000 port")
})
Try catch won't capture the errors that occurred in the stream because as they are thrown after the calling code has already exited. you can refer to the documentation:
https://nodejs.org/dist/latest-v10.x/docs/api/errors.html
I'm writing a JavaScript function that makes an HTTP request and returns a promise for the result (but this question applies equally for a callback-based implementation).
If I know immediately that the arguments supplied for the function are invalid, should the function throw synchronously, or should it return a rejected promise (or, if you prefer, invoke callback with an Error instance)?
How important is it that an async function should always behave in an async manner, particularly for error conditions? Is it OK to throw if you know that the program is not in a suitable state for the async operation to proceed?
e.g:
function getUserById(userId, cb) {
if (userId !== parseInt(userId)) {
throw new Error('userId is not valid')
}
// make async call
}
// OR...
function getUserById(userId, cb) {
if (userId !== parseInt(userId)) {
return cb(new Error('userId is not valid'))
}
// make async call
}
Ultimately the decision to synchronously throw or not is up to you, and you will likely find people who argue either side. The important thing is to document the behavior and maintain consistency in the behavior.
My opinion on the matter is that your second option - passing the error into the callback - seems more elegant. Otherwise you end up with code that looks like this:
try {
getUserById(7, function (response) {
if (response.isSuccess) {
//Success case
} else {
//Failure case
}
});
} catch (error) {
//Other failure case
}
The control flow here is slightly confusing.
It seems like it would be better to have a single if / else if / else structure in the callback and forgo the surrounding try / catch.
This is largely a matter of opinion. Whatever you do, do it consistently, and document it clearly.
One objective piece of information I can give you is that this was the subject of much discussion in the design of JavaScript's async functions, which as you may know implicitly return promises for their work. You may also know that the part of an async function prior to the first await or return is synchronous; it only becomes asynchronous at the point it awaits or returns.
TC39 decided in the end that even errors thrown in the synchronous part of an async function should reject its promise rather than raising a synchronous error. For example:
async function someAsyncStuff() {
return 21;
}
async function example() {
console.log("synchronous part of function");
throw new Error("failed");
const x = await someAsyncStuff();
return x * 2;
}
try {
console.log("before call");
example().catch(e => { console.log("asynchronous:", e.message); });
console.log("after call");
} catch (e) {
console.log("synchronous:", e.message);
}
There you can see that even though throw new Error("failed") is in the synchronous part of the function, it rejects the promise rather than raising a synchronous error.
That's true even for things that happen before the first statement in the function body, such as determining the default value for a missing function parameter:
async function someAsyncStuff() {
return 21;
}
async function example(p = blah()) {
console.log("synchronous part of function");
throw new Error("failed");
const x = await Promise.resolve(42);
return x;
}
try {
console.log("before call");
example().catch(e => { console.log("asynchronous:", e.message); });
console.log("after call");
} catch (e) {
console.log("synchronous:", e.message);
}
That fails because it tries to call blah, which doesn't exist, when it runs the code to get the default value for the p parameter I didn't supply in the call. As you can see, even that rejects the promise rather than throwing a synchronous error.
TC39 could have gone the other way, and had the synchronous part raise a synchronous error, like this non-async function does:
async function someAsyncStuff() {
return 21;
}
function example() {
console.log("synchronous part of function");
throw new Error("failed");
return someAsyncStuff().then(x => x * 2);
}
try {
console.log("before call");
example().catch(e => { console.log("asynchronous:", e.message); });
console.log("after call");
} catch (e) {
console.log("synchronous:", e.message);
}
But they decided, after discussion, on consistent promise rejection instead.
So that's one concrete piece of information to consider in your decision about how you should handle this in your own non-async functions that do asynchronous work.
How important is it that an async function should always behave in an async manner, particularly for error conditions?
Very important.
Is it OK to throw if you know that the program is not in a suitable state for the async operation to proceed?
Yes, I personally think it is OK when that is a very different error from any asynchronously produced ones, and needs to be handled separately anyway.
If some userids are known to be invalid because they're not numeric, and some are will be rejected on the server (eg because they're already taken) you should consistently make an (async!) callback for both cases. If the async errors would only arise from network problems etc, you might signal them differently.
You always may throw when an "unexpected" error arises. If you demand valid userids, you might throw on invalid ones. If you want to anticipate invalid ones and expect the caller to handle them, you should use a "unified" error route which would be the callback/rejected promise for an async function.
And to repeat #Timothy: You should always document the behavior and maintain consistency in the behavior.
Callback APIs ideally shouldn't throw but they do throw because it's very hard to avoid since you have to have try catch literally everywhere. Remember that throwing error explicitly by throw is not required for a function to throw. Another thing that adds to this is that the user callback can easily throw too, for example calling JSON.parse without try catch.
So this is what the code would look like that behaves according to these ideals:
readFile("file.json", function(err, val) {
if (err) {
console.error("unable to read file");
}
else {
try {
val = JSON.parse(val);
console.log(val.success);
}
catch(e) {
console.error("invalid json in file");
}
}
});
Having to use 2 different error handling mechanisms is really inconvenient, so if you don't want your program to be a fragile house of cards (by not writing any try catch ever) you should use promises which unify all exception handling under a single mechanism:
readFile("file.json").then(JSON.parse).then(function(val) {
console.log(val.success);
})
.catch(SyntaxError, function(e) {
console.error("invalid json in file");
})
.catch(function(e){
console.error("unable to read file")
})
Ideally you would have a multi-layer architecture like controllers, services, etc. If you do validations in services, throw immediately and have a catch block in your controller to catch the error format it and send an appropriate http error code. This way you can centralize all bad request handling logic. If you handle each case youll end up writing more code. But thats just how I would do it. Depends on your use case
What's the correct way to handle errors with streams? I already know there's an 'error' event you can listen on, but I want to know some more details about arbitrarily complicated situations.
For starters, what do you do when you want to do a simple pipe chain:
input.pipe(transformA).pipe(transformB).pipe(transformC)...
And how do you properly create one of those transforms so that errors are handled correctly?
More related questions:
when an error happens, what happens to the 'end' event? Does it never get fired? Does it sometimes get fired? Does it depend on the transform/stream? What are the standards here?
are there any mechanisms for propogating errors through the pipes?
do domains solve this problem effectively? Examples would be nice.
do errors that come out of 'error' events have stack traces? Sometimes? Never? is there a way to get one from them?
transform
Transform streams are both readable and writeable, and thus are really good 'middle' streams. For this reason, they are sometimes referred to as through streams. They are similar to a duplex stream in this way, except they provide a nice interface to manipulate the data rather than just sending it through. The purpose of a transform stream is to manipulate the data as it is piped through the stream. You may want to do some async calls, for example, or derive a couple of fields, remap some things, etc.
For how to create a transform stream see here and here. All you have to do is :
include the stream module
instantiate ( or inherit from) the Transform class
implement a _transform method which takes a (chunk, encoding, callback).
The chunk is your data. Most of the time you won't need to worry about encoding if you are working in objectMode = true. The callback is called when you are done processing the chunk. This chunk is then pushed on to the next stream.
If you want a nice helper module that will enable you to do through stream really really easily, I suggest through2.
For error handling, keep reading.
pipe
In a pipe chain, handling errors is indeed non-trivial. According to this thread .pipe() is not built to forward errors. So something like ...
var a = createStream();
a.pipe(b).pipe(c).on('error', function(e){handleError(e)});
... would only listen for errors on the stream c. If an error event was emitted on a, that would not be passed down and, in fact, would throw. To do this correctly:
var a = createStream();
a.on('error', function(e){handleError(e)})
.pipe(b)
.on('error', function(e){handleError(e)})
.pipe(c)
.on('error', function(e){handleError(e)});
Now, though the second way is more verbose, you can at least keep the context of where your errors happen. This is usually a good thing.
One library I find helpful though if you have a case where you only want to capture the errors at the destination and you don't care so much about where it happened is event-stream.
end
When an error event is fired, the end event will not be fired (explicitly). The emitting of an error event will end the stream.
domains
In my experience, domains work really well most of the time. If you have an unhandled error event (i.e. emitting an error on a stream without a listener), the server can crash. Now, as the above article points out, you can wrap the stream in a domain which should properly catch all errors.
var d = domain.create();
d.on('error', handleAllErrors);
d.run(function() {
fs.createReadStream(tarball)
.pipe(gzip.Gunzip())
.pipe(tar.Extract({ path: targetPath }))
.on('close', cb);
});
the above code sample is from this post
The beauty of domains is that they will preserve the stack traces. Though event-stream does a good job of this as well.
For further reading, check out the stream-handbook. Pretty in depth, but super useful and gives some great links to lots of helpful modules.
If you are using node >= v10.0.0 you can use stream.pipeline and stream.finished.
For example:
const { pipeline, finished } = require('stream');
pipeline(
input,
transformA,
transformB,
transformC,
(err) => {
if (err) {
console.error('Pipeline failed', err);
} else {
console.log('Pipeline succeeded');
}
});
finished(input, (err) => {
if (err) {
console.error('Stream failed', err);
} else {
console.log('Stream is done reading');
}
});
See this github PR for more discussion.
domains are deprecated. you dont need them.
for this question, distinctions between transform or writable are not so important.
mshell_lauren's answer is great, but as an alternative you can also explicitly listen for the error event on each stream you think might error. and reuse the handler function if you prefer.
var a = createReadableStream()
var b = anotherTypeOfStream()
var c = createWriteStream()
a.on('error', handler)
b.on('error', handler)
c.on('error', handler)
a.pipe(b).pipe(c)
function handler (err) { console.log(err) }
doing so prevents the infamous uncaught exception should one of those stream fire its error event
Errors from the whole chain can be propagated to the rightmost stream using a simple function:
function safePipe (readable, transforms) {
while (transforms.length > 0) {
var new_readable = transforms.shift();
readable.on("error", function(e) { new_readable.emit("error", e); });
readable.pipe(new_readable);
readable = new_readable;
}
return readable;
}
which can be used like:
safePipe(readable, [ transform1, transform2, ... ]);
.on("error", handler) only takes care of Stream errors but if you are using custom Transform streams, .on("error", handler) don't catch the errors happening inside _transform function. So one can do something like this for controlling application flow :-
this keyword in _transform function refers to Stream itself, which is an EventEmitter. So you can use try catch like below to catch the errors and later on pass them to the custom event handlers.
// CustomTransform.js
CustomTransformStream.prototype._transform = function (data, enc, done) {
var stream = this
try {
// Do your transform code
} catch (e) {
// Now based on the error type, with an if or switch statement
stream.emit("CTError1", e)
stream.emit("CTError2", e)
}
done()
}
// StreamImplementation.js
someReadStream
.pipe(CustomTransformStream)
.on("CTError1", function (e) { console.log(e) })
.on("CTError2", function (e) { /*Lets do something else*/ })
.pipe(someWriteStream)
This way, you can keep your logic and error handlers separate. Also , you can opt to handle only some errors and ignore others.
UPDATE
Alternative: RXJS Observable
Use multipipe package to combinate several streams into one duplex stream. And handle errors in one place.
const pipe = require('multipipe')
// pipe streams
const stream = pipe(streamA, streamB, streamC)
// centralized error handling
stream.on('error', fn)
Use Node.js pattern by creating a Transform stream mechanics and calling its callback done with an argument in order to propagate the error:
var transformStream1 = new stream.Transform(/*{objectMode: true}*/);
transformStream1.prototype._transform = function (chunk, encoding, done) {
//var stream = this;
try {
// Do your transform code
/* ... */
} catch (error) {
// nodejs style for propagating an error
return done(error);
}
// Here, everything went well
done();
}
// Let's use the transform stream, assuming `someReadStream`
// and `someWriteStream` have been defined before
someReadStream
.pipe(transformStream1)
.on('error', function (error) {
console.error('Error in transformStream1:');
console.error(error);
process.exit(-1);
})
.pipe(someWriteStream)
.on('close', function () {
console.log('OK.');
process.exit();
})
.on('error', function (error) {
console.error(error);
process.exit(-1);
});
const http = require('http');
const fs = require('fs');
const server = http.createServer();
server.on('request',(req,res)=>{
const readableStream = fs.createReadStream(__dirname+'/README.md');
const writeableStream = fs.createWriteStream(__dirname+'/assets/test.txt');
readableStream
.on('error',()=>{
res.end("File not found")
})
.pipe(writeableStream)
.on('error',(error)=>{
console.log(error)
res.end("Something went to wrong!")
})
.on('finish',()=>{
res.end("Done!")
})
})
server.listen(8000,()=>{
console.log("Server is running in 8000 port")
})
Try catch won't capture the errors that occurred in the stream because as they are thrown after the calling code has already exited. you can refer to the documentation:
https://nodejs.org/dist/latest-v10.x/docs/api/errors.html
How to close a readable stream in Node.js?
var input = fs.createReadStream('lines.txt');
input.on('data', function(data) {
// after closing the stream, this will not
// be called again
if (gotFirstLine) {
// close this stream and continue the
// instructions from this if
console.log("Closed.");
}
});
This would be better than:
input.on('data', function(data) {
if (isEnded) { return; }
if (gotFirstLine) {
isEnded = true;
console.log("Closed.");
}
});
But this would not stop the reading process...
Edit: Good news! Starting with Node.js 8.0.0 readable.destroy is officially available: https://nodejs.org/api/stream.html#stream_readable_destroy_error
ReadStream.destroy
You can call the ReadStream.destroy function at any time.
var fs = require("fs");
var readStream = fs.createReadStream("lines.txt");
readStream
.on("data", function (chunk) {
console.log(chunk);
readStream.destroy();
})
.on("end", function () {
// This may not been called since we are destroying the stream
// the first time "data" event is received
console.log("All the data in the file has been read");
})
.on("close", function (err) {
console.log("Stream has been destroyed and file has been closed");
});
The public function ReadStream.destroy is not documented (Node.js v0.12.2) but you can have a look at the source code on GitHub (Oct 5, 2012 commit).
The destroy function internally mark the ReadStream instance as destroyed and calls the close function to release the file.
You can listen to the close event to know exactly when the file is closed. The end event will not fire unless the data is completely consumed.
Note that the destroy (and the close) functions are specific to fs.ReadStream. There are not part of the generic stream.readable "interface".
Invoke input.close(). It's not in the docs, but
https://github.com/joyent/node/blob/cfcb1de130867197cbc9c6012b7e84e08e53d032/lib/fs.js#L1597-L1620
clearly does the job :) It actually does something similar to your isEnded.
EDIT 2015-Apr-19 Based on comments below, and to clarify and update:
This suggestion is a hack, and is not documented.
Though for looking at the current lib/fs.js it still works >1.5yrs later.
I agree with the comment below about calling destroy() being preferable.
As correctly stated below this works for fs ReadStreams's, not on a generic Readable
As for a generic solution: it doesn't appear as if there is one, at least from my understanding of the documentation and from a quick look at _stream_readable.js.
My proposal would be put your readable stream in paused mode, at least preventing further processing in your upstream data source. Don't forget to unpipe() and remove all data event listeners so that pause() actually pauses, as mentioned in the docs
Today, in Node 10
readableStream.destroy()
is the official way to close a readable stream
see https://nodejs.org/api/stream.html#stream_readable_destroy_error
You can't. There is no documented way to close/shutdown/abort/destroy a generic Readable stream as of Node 5.3.0. This is a limitation of the Node stream architecture.
As other answers here have explained, there are undocumented hacks for specific implementations of Readable provided by Node, such as fs.ReadStream. These are not generic solutions for any Readable though.
If someone can prove me wrong here, please do. I would like to be able to do what I'm saying is impossible, and would be delighted to be corrected.
EDIT: Here was my workaround: implement .destroy() for my pipeline though a complex series of unpipe() calls. And after all that complexity, it doesn't work properly in all cases.
EDIT: Node v8.0.0 added a destroy() api for Readable streams.
At version 4.*.* pushing a null value into the stream will trigger a EOF signal.
From the nodejs docs
If a value other than null is passed, The push() method adds a chunk of data into the queue for subsequent stream processors to consume. If null is passed, it signals the end of the stream (EOF), after which no more data can be written.
This worked for me after trying numerous other options on this page.
This destroy module is meant to ensure a stream gets destroyed, handling different APIs and Node.js bugs. Right now is one of the best choice.
NB. From Node 10 you can use the .destroy method without further dependencies.
You can clear and close the stream with yourstream.resume(), which will dump everything on the stream and eventually close it.
From the official docs:
readable.resume():
Return: this
This method will cause the readable stream to resume emitting 'data' events.
This method will switch the stream into flowing mode. If you do not want to consume the data from a stream, but you do want to get to its 'end' event, you can call stream.resume() to open the flow of data.
var readable = getReadableStreamSomehow();
readable.resume();
readable.on('end', () => {
console.log('got to the end, but did not read anything');
});
It's an old question but I too was looking for the answer and found the best one for my implementation. Both end and close events get emitted so I think this is the cleanest solution.
This will do the trick in node 4.4.* (stable version at the time of writing):
var input = fs.createReadStream('lines.txt');
input.on('data', function(data) {
if (gotFirstLine) {
this.end(); // Simple isn't it?
console.log("Closed.");
}
});
For a very detailed explanation see:
http://www.bennadel.com/blog/2692-you-have-to-explicitly-end-streams-after-pipes-break-in-node-js.htm
This code here will do the trick nicely:
function closeReadStream(stream) {
if (!stream) return;
if (stream.close) stream.close();
else if (stream.destroy) stream.destroy();
}
writeStream.end() is the go-to way to close a writeStream...
for stop callback execution after some call,
you have to use process.kill with particular processID
const csv = require('csv-parser');
const fs = require('fs');
const filepath = "./demo.csv"
let readStream = fs.createReadStream(filepath, {
autoClose: true,
});
let MAX_LINE = 0;
readStream.on('error', (e) => {
console.log(e);
console.log("error");
})
.pipe(csv())
.on('data', (row) => {
if (MAX_LINE == 2) {
process.kill(process.pid, 'SIGTERM')
}
// console.log("not 2");
MAX_LINE++
console.log(row);
})
.on('end', () => {
// handle end of CSV
console.log("read done");
}).on("close", function () {
console.log("closed");
})