Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close in Node Pipeline stream - node.js

I am using the stream.pipeline functionality from Node to upload some data to S3. The basic idea I'm implementing is pulling files from a request and writing them to S3. I have one pipeline that pulls zip files and writes them to S3 successfully. However, I want my second pipeline to make the same request, but unzip and write the unzipped files to S3. The pipeline code looks like the following:
pipeline(request.get(...), s3Stream(zipFileWritePath)),
pipeline(request.get(...), new unzipper.Parse(), etl.map(entry => entry.pipe(s3Stream(createWritePath(writePath, entry)))))
The s3Stream function looks like so:
function s3Stream(file) {
const pass = new stream.PassThrough()
s3Store.upload(file, pass)
return pass
}
The first pipeline works well, and is currently operating greatly in production. However, when adding the second pipeline, I get the following error:
Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at Parse.onclose (internal/streams/end-of-stream.js:56:36)
at Parse.emit (events.js:187:15)
at Parse.EventEmitter.emit (domain.js:442:20)
at Parse.<anonymous> (/node_modules/unzipper/lib/parse.js:28:10)
at Parse.emit (events.js:187:15)
at Parse.EventEmitter.emit (domain.js:442:20)
at finishMaybe (_stream_writable.js:641:14)
at afterWrite (_stream_writable.js:481:3)
at onwrite (_stream_writable.js:471:7)
at /node_modules/unzipper/lib/PullStream.js:70:11
at afterWrite (_stream_writable.js:480:3)
at process._tickCallback (internal/process/next_tick.js:63:19)
Any idea what could be causing this or solutions to resolve this would be greatly appreciated!

TL;DR
When using a pipeline you accept to consume the readable stream fully, you don't want anything stopping before the readable ends.
Deep dive
After some time working with those shenanigans here is some more usefull informations.
import stream from 'stream'
const s1 = new stream.PassThrough()
const s2 = new stream.PassThrough()
const s3 = new stream.PassThrough()
s1.on('end', () => console.log('end 1'))
s2.on('end', () => console.log('end 2'))
s3.on('end', () => console.log('end 3'))
s1.on('close', () => console.log('close 1'))
s2.on('close', () => console.log('close 2'))
s3.on('close', () => console.log('close 3'))
stream.pipeline(
s1,
s2,
s3,
async s => { for await (_ of s) { } },
err => console.log('end', err)
)
now if i call s2.end() it will close all parents
end 2
close 2
end 3
close 3
pipeline is the equivalent of s3(s2(s1)))
but if i call s2.destroy() it print and destroy everything, this is your problem here a stream is destroyed before it ends normally, either an error or a return/break/throws in an asyncGenerator/asyncFunction
close 2
end Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at PassThrough.onclose (internal/streams/end-of-stream.js:117:38)
at PassThrough.emit (events.js:327:22)
at emitCloseNT (internal/streams/destroy.js:81:10)
at processTicksAndRejections (internal/process/task_queues.js:83:21) {
code: 'ERR_STREAM_PREMATURE_CLOSE'
}
close 1
close 3
You must not let one of the streams without a way to catch their errors
stream.pipeline() leaves dangling event listeners on the streams after theallback has been invoked. In the case of reuse of streams after failure, this can cause event listener leaks and swallowed errors.
node source (14.4)
const onclose = () => {
if (readable && !readableEnded) {
if (!isReadableEnded(stream))
return callback.call(stream, new ERR_STREAM_PREMATURE_CLOSE());
}
if (writable && !writableFinished) {
if (!isWritableFinished(stream))
return callback.call(stream, new ERR_STREAM_PREMATURE_CLOSE());
}
callback.call(stream);
};

Related

How to properly close a writable stream in Node js?

I'm quite new to javascripts. I'm using node js writable stream to write a .txt file; It works well, but I cannot understand how to properly close the file, as its content is blank as long as the program is running. More in detail I need to read from that .txt file after it has been written, but doing it this way returns an empty buffer.
let myWriteStream = fs.createWriteStream("./filepath.txt");
myWriteStream.write(stringBuffer + "\n");
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.end();
// do things..
let data = fs.readFileSync("./filepath.txt").toString().split("\n");
Seems like the event emitted by the .end() method is triggered after the file reading, causing it to be read as empty. If I put a while() to wait for the event to be triggered, so that I know for sure the stream is closed before the reading, the program waits forever.
Do you have any clue of what I'm doing wrong?
your missing 2 things one test that write is succeed
then you need to wait for stream finish event
const { readFileSync, createWriteStream } = require('fs')
const stringBuffer = Buffer.from(readFileSync('index.js')
)
const filePath = "./filepath.txt"
const myWriteStream = createWriteStream(filePath)
let backPressureTest = false;
while (!backPressureTest) {
backPressureTest = myWriteStream.write(stringBuffer + "\n");
}
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.on('finish', () => {
console.log('finish event emitted');
let data = readFileSync(filePath).toString().split("\n");
console.log(data);
});
myWriteStream.end();

Avoid re-fetching data while streaming in Expressjs

I've just started playing with streaming data in Expressjs.
Not entirely sure, but I think the request will start to execute the handler again. For example, here is my handler:
import getDataAsync from "./somewhere";
function handler(req, res) {
console.log('requesting', req.path);
getDataAsync()
.then(data => {
let stream = renderContent(data);
stream.pipe(res);
})
.catch(err => {
res.end();
})
}
What I found was, it continue to print out console.log('requesting', req.path) (which I think will re-execute getDataAsync).
My question is:
Is it true it will re-execute getDataAsync?
If it does, what's your approach?
Thank heaps!
Node JS is non-blocking, so if you were to make a request to an endpoint with this handler again then it will execute. The handler will call getDataAsync() and then the handler gets removed from call stack. The process is repeated for each request.
If you want the handler to wait out the stream before it calls it again you could do:
import getDataAsync from "./somewhere";
let streamComplete = true;
function handler(req, res) {
if(!streamComplete) {
res.end();
}
console.log('requesting', req.path);
getDataAsync()
.then(data => {
streamComplete = false;
let stream = renderContent(data);
stream.pipe(res);
stream.on('end', () => streamComplete = true);
})
.catch(err => {
res.end();
})
}
I did need to sort this problem out in one of my projects. Node or in fact any other environment/language will have the same issue, that once you start streaming the data to one client, it's rather hard to stream it to another. This is due to the fact that once you do this:
inputStream.pipe(outputStream);
...the input data will be pushed out to output and will be removed from memory. So if you just pipe the inputStream again, you'll have some initial part of the data missing.
The solution I came up with was to write a Transform stream that kept the data in memory and you can reuse it afterwards. Such a stream will have all the original chunks and at the same time when it catches up with the first request, it will just keep pushing the chunks directly. I packaged the solution as a npm module and published so now you can use it.
This is how you use it:
const {ReReadable} = require("rereadable-stream");
// We'll use this for caching - you can use a Map if you have more streams
let cachedStream;
// This function will get the stream and
const getCachedStream = () =>
(cachedStream || (cachedStream =
getDataAsync()
.then(
data => renderContent(data).pipe(new ReReadable())
))
)
.then(readable => readable.rewind())
Such a function will call you getDataAsync once and then will push the data to a the rewindable stream, but every time the function is executed the stream will be rewound to the begining.
You can read a bit more about the rereadable-stream module here.
A word of warning though - remember, that you will keep all that data in memory now, so be careful to clean it up if there's more chunks there and control your memory usage.

_read() is not implemented on Readable stream

This question is how to really implement the read method of a readable stream.
I have this implementation of a Readable stream:
import {Readable} from "stream";
this.readableStream = new Readable();
I am getting this error
events.js:136
throw er; // Unhandled 'error' event
^
Error [ERR_STREAM_READ_NOT_IMPLEMENTED]: _read() is not implemented
at Readable._read (_stream_readable.js:554:22)
at Readable.read (_stream_readable.js:445:10)
at resume_ (_stream_readable.js:825:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:684:11)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:613:3
The reason the error occurs is obvious, we need to do this:
this.readableStream = new Readable({
read(size) {
return true;
}
});
I don't really understand how to implement the read method though.
The only thing that works is just calling
this.readableStream.push('some string or buffer');
if I try to do something like this:
this.readableStream = new Readable({
read(size) {
this.push('foo'); // call push here!
return true;
}
});
then nothing happens - nothing comes out of the readable!
Furthermore, these articles says you don't need to implement the read method:
https://github.com/substack/stream-handbook#creating-a-readable-stream
https://medium.freecodecamp.org/node-js-streams-everything-you-need-to-know-c9141306be93
My question is - why does calling push inside the read method do nothing? The only thing that works for me is just calling readable.push() elsewhere.
why does calling push inside the read method do nothing? The only thing that works for me is just calling readable.push() elsewhere.
I think it's because you are not consuming it, you need to pipe it to an writable stream (e.g. stdout) or just consume it through a data event:
const { Readable } = require("stream");
let count = 0;
const readableStream = new Readable({
read(size) {
this.push('foo');
if (count === 5) this.push(null);
count++;
}
});
// piping
readableStream.pipe(process.stdout)
// through the data event
readableStream.on('data', (chunk) => {
console.log(chunk.toString());
});
Both of them should print 5 times foo (they are slightly different though). Which one you should use depends on what you are trying to accomplish.
Furthermore, these articles says you don't need to implement the read method:
You might not need it, this should work:
const { Readable } = require("stream");
const readableStream = new Readable();
for (let i = 0; i <= 5; i++) {
readableStream.push('foo');
}
readableStream.push(null);
readableStream.pipe(process.stdout)
In this case you can't capture it through the data event. Also, this way is not very useful and not efficient I'd say, we are just pushing all the data in the stream at once (if it's large everything is going to be in memory), and then consuming it.
From documentation:
readable._read:
"When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. link"
readable.push:
"The readable.push() method is intended be called only by Readable implementers, and only from within the readable._read() method. link"
Implement the _read method after your ReadableStream's initialization:
import {Readable} from "stream";
this.readableStream = new Readable();
this.readableStream.read = function () {};
readableStream is like a pool:
.push(data), It's like pumping water to a pool.
.pipe(destination), It's like connecting the pool to a pipe and pump water to other place
The _read(size) run as a pumper and control how much water flow and when the data is end.
The fs.createReadStream() will create read stream with the _read() function has been auto implemented to push file data and end when end of file.
The _read(size) is auto fire when the pool is attached to a pipe. Thus, if you force calling this function without connect a way to destination, it will pump to ?where? and it affect the machine status inside _read() (may be the cursor move to wrong place,...)
The read() function must be create inside new Stream.Readable(). It's actually a function inside an object. It's not readableStream.read(), and implement readableStream.read=function(size){...} will not work.
The easy way to understand implement:
var Reader=new Object();
Reader.read=function(size){
if (this.i==null){this.i=1;}else{this.i++;}
this.push("abc");
if (this.i>7){ this.push(null); }
}
const Stream = require('stream');
const renderStream = new Stream.Readable(Reader);
renderStream.pipe(process.stdout)
You can use it to reder what ever stream data to POST to other server.
POST stream data with Axios :
require('axios')({
method: 'POST',
url: 'http://127.0.0.1:3000',
headers: {'Content-Length': 1000000000000},
data: renderStream
});

NodeJS Stream splitting

I have an infinite data stream from a forked process. I want this stream to be processed by a module and sometimes I want to duplicate the data from this stream to be processed by a different module (e.g. monitoring a data stream but if anything interesting happens I want to log the next n bytes to file for further investigation).
So let's suppose the following scenario:
I start the program and start consuming the readable stream
2 secs later I want to process the same data for 1 sec by a different stream reader
Once the time is up I want to close the second consumer but the original consumer must stay untouched.
Here is a code snippet for this:
var stream = process.stdout;
stream.pipe(detector); // Using the first consumer
function startAnotherConsumer() {
stream2 = new PassThrough();
stream.pipe(stream2);
// use stream2 somewhere else
}
function stopAnotherConsumer() {
stream.unpipe(stream2);
}
My problem here is that unpiping the stream2 doesn't get it closed. If I call stream.end() after the unpipe command, then it crashes with the error:
events.js:160
throw er; // Unhandled 'error' event
^
Error: write after end
at writeAfterEnd (_stream_writable.js:192:12)
at PassThrough.Writable.write (_stream_writable.js:243:5)
at Socket.ondata (_stream_readable.js:555:20)
at emitOne (events.js:101:20)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
at Socket.Readable.push (_stream_readable.js:134:10)
at Pipe.onread (net.js:548:20)
I even tried to pause the source stream to help the buffer to be flushed from the second stream but it didn't work either:
function stopAnotherConsumer() {
stream.pause();
stream2.once('unpipe', function () {
stream.resume();
stream2.end();
});
stream.unpipe(stream2);
}
Same error as before here (write after end).
How to solve the problem? My original intent is to duplicate the streamed data from one point, then close the second stream after a while.
Note: I tried to use this answer to make it work.
As there were no answers, I post my (patchwork) solution. In case anyone'd have a better one, don't hold it back.
A new Stream:
const Writable = require('stream').Writable;
const Transform = require('stream').Transform;
class DuplicatorStream extends Transform {
constructor(options) {
super(options);
this.otherStream = null;
}
attachStream(stream) {
if (!stream instanceof Writable) {
throw new Error('DuplicatorStream argument is not a writeable stream!');
}
if (this.otherStream) {
throw new Error('A stream is already attached!');
}
this.otherStream = stream;
this.emit('attach', stream);
}
detachStream() {
if (!this.otherStream) {
throw new Error('No stream to detach!');
}
let stream = this.otherStream;
this.otherStream = null;
this.emit('detach', stream);
}
_transform(chunk, encoding, callback) {
if (this.otherStream) {
this.otherStream.write(chunk);
}
callback(null, chunk);
}
}
module.exports = DuplicatorStream;
And the usage:
var stream = process.stdout;
var stream2;
duplicatorStream = new DuplicatorStream();
stream.pipe(duplicatorStream); // Inserting my duplicator stream in the chain
duplicatorStream.pipe(detector); // Using the first consumer
function startAnotherConsumer() {
stream2 = new stream.PassThrough();
duplicatorStream.attachStream(stream2);
// use stream2 somewhere else
}
function stopAnotherConsumer() {
duplicatorStream.once('detach', function () {
stream2.end();
});
duplicatorStream.detachStream();
}

Error handling with node.js streams

What's the correct way to handle errors with streams? I already know there's an 'error' event you can listen on, but I want to know some more details about arbitrarily complicated situations.
For starters, what do you do when you want to do a simple pipe chain:
input.pipe(transformA).pipe(transformB).pipe(transformC)...
And how do you properly create one of those transforms so that errors are handled correctly?
More related questions:
when an error happens, what happens to the 'end' event? Does it never get fired? Does it sometimes get fired? Does it depend on the transform/stream? What are the standards here?
are there any mechanisms for propogating errors through the pipes?
do domains solve this problem effectively? Examples would be nice.
do errors that come out of 'error' events have stack traces? Sometimes? Never? is there a way to get one from them?
transform
Transform streams are both readable and writeable, and thus are really good 'middle' streams. For this reason, they are sometimes referred to as through streams. They are similar to a duplex stream in this way, except they provide a nice interface to manipulate the data rather than just sending it through. The purpose of a transform stream is to manipulate the data as it is piped through the stream. You may want to do some async calls, for example, or derive a couple of fields, remap some things, etc.
For how to create a transform stream see here and here. All you have to do is :
include the stream module
instantiate ( or inherit from) the Transform class
implement a _transform method which takes a (chunk, encoding, callback).
The chunk is your data. Most of the time you won't need to worry about encoding if you are working in objectMode = true. The callback is called when you are done processing the chunk. This chunk is then pushed on to the next stream.
If you want a nice helper module that will enable you to do through stream really really easily, I suggest through2.
For error handling, keep reading.
pipe
In a pipe chain, handling errors is indeed non-trivial. According to this thread .pipe() is not built to forward errors. So something like ...
var a = createStream();
a.pipe(b).pipe(c).on('error', function(e){handleError(e)});
... would only listen for errors on the stream c. If an error event was emitted on a, that would not be passed down and, in fact, would throw. To do this correctly:
var a = createStream();
a.on('error', function(e){handleError(e)})
.pipe(b)
.on('error', function(e){handleError(e)})
.pipe(c)
.on('error', function(e){handleError(e)});
Now, though the second way is more verbose, you can at least keep the context of where your errors happen. This is usually a good thing.
One library I find helpful though if you have a case where you only want to capture the errors at the destination and you don't care so much about where it happened is event-stream.
end
When an error event is fired, the end event will not be fired (explicitly). The emitting of an error event will end the stream.
domains
In my experience, domains work really well most of the time. If you have an unhandled error event (i.e. emitting an error on a stream without a listener), the server can crash. Now, as the above article points out, you can wrap the stream in a domain which should properly catch all errors.
var d = domain.create();
d.on('error', handleAllErrors);
d.run(function() {
fs.createReadStream(tarball)
.pipe(gzip.Gunzip())
.pipe(tar.Extract({ path: targetPath }))
.on('close', cb);
});
the above code sample is from this post
The beauty of domains is that they will preserve the stack traces. Though event-stream does a good job of this as well.
For further reading, check out the stream-handbook. Pretty in depth, but super useful and gives some great links to lots of helpful modules.
If you are using node >= v10.0.0 you can use stream.pipeline and stream.finished.
For example:
const { pipeline, finished } = require('stream');
pipeline(
input,
transformA,
transformB,
transformC,
(err) => {
if (err) {
console.error('Pipeline failed', err);
} else {
console.log('Pipeline succeeded');
}
});
finished(input, (err) => {
if (err) {
console.error('Stream failed', err);
} else {
console.log('Stream is done reading');
}
});
See this github PR for more discussion.
domains are deprecated. you dont need them.
for this question, distinctions between transform or writable are not so important.
mshell_lauren's answer is great, but as an alternative you can also explicitly listen for the error event on each stream you think might error. and reuse the handler function if you prefer.
var a = createReadableStream()
var b = anotherTypeOfStream()
var c = createWriteStream()
a.on('error', handler)
b.on('error', handler)
c.on('error', handler)
a.pipe(b).pipe(c)
function handler (err) { console.log(err) }
doing so prevents the infamous uncaught exception should one of those stream fire its error event
Errors from the whole chain can be propagated to the rightmost stream using a simple function:
function safePipe (readable, transforms) {
while (transforms.length > 0) {
var new_readable = transforms.shift();
readable.on("error", function(e) { new_readable.emit("error", e); });
readable.pipe(new_readable);
readable = new_readable;
}
return readable;
}
which can be used like:
safePipe(readable, [ transform1, transform2, ... ]);
.on("error", handler) only takes care of Stream errors but if you are using custom Transform streams, .on("error", handler) don't catch the errors happening inside _transform function. So one can do something like this for controlling application flow :-
this keyword in _transform function refers to Stream itself, which is an EventEmitter. So you can use try catch like below to catch the errors and later on pass them to the custom event handlers.
// CustomTransform.js
CustomTransformStream.prototype._transform = function (data, enc, done) {
var stream = this
try {
// Do your transform code
} catch (e) {
// Now based on the error type, with an if or switch statement
stream.emit("CTError1", e)
stream.emit("CTError2", e)
}
done()
}
// StreamImplementation.js
someReadStream
.pipe(CustomTransformStream)
.on("CTError1", function (e) { console.log(e) })
.on("CTError2", function (e) { /*Lets do something else*/ })
.pipe(someWriteStream)
This way, you can keep your logic and error handlers separate. Also , you can opt to handle only some errors and ignore others.
UPDATE
Alternative: RXJS Observable
Use multipipe package to combinate several streams into one duplex stream. And handle errors in one place.
const pipe = require('multipipe')
// pipe streams
const stream = pipe(streamA, streamB, streamC)
// centralized error handling
stream.on('error', fn)
Use Node.js pattern by creating a Transform stream mechanics and calling its callback done with an argument in order to propagate the error:
var transformStream1 = new stream.Transform(/*{objectMode: true}*/);
transformStream1.prototype._transform = function (chunk, encoding, done) {
//var stream = this;
try {
// Do your transform code
/* ... */
} catch (error) {
// nodejs style for propagating an error
return done(error);
}
// Here, everything went well
done();
}
// Let's use the transform stream, assuming `someReadStream`
// and `someWriteStream` have been defined before
someReadStream
.pipe(transformStream1)
.on('error', function (error) {
console.error('Error in transformStream1:');
console.error(error);
process.exit(-1);
})
.pipe(someWriteStream)
.on('close', function () {
console.log('OK.');
process.exit();
})
.on('error', function (error) {
console.error(error);
process.exit(-1);
});
const http = require('http');
const fs = require('fs');
const server = http.createServer();
server.on('request',(req,res)=>{
const readableStream = fs.createReadStream(__dirname+'/README.md');
const writeableStream = fs.createWriteStream(__dirname+'/assets/test.txt');
readableStream
.on('error',()=>{
res.end("File not found")
})
.pipe(writeableStream)
.on('error',(error)=>{
console.log(error)
res.end("Something went to wrong!")
})
.on('finish',()=>{
res.end("Done!")
})
})
server.listen(8000,()=>{
console.log("Server is running in 8000 port")
})
Try catch won't capture the errors that occurred in the stream because as they are thrown after the calling code has already exited. you can refer to the documentation:
https://nodejs.org/dist/latest-v10.x/docs/api/errors.html

Resources