Proper way to consume NodeJS stream into buffer and write stream - node.js

I have a need to pipe a readable stream into both a buffer (to be converted into a string) and a file. The stream is coming from node-fetch.
NodeJS streams have two states: paused and flowing. From what I understand, as soon as a 'data' listener is attached, the stream will change to flowing mode. I want to make sure the way I am reading a stream will not lose any bytes.
Method 1: piping and reading from 'data':
fetch(url).then(
response =>
new Promise(resolve => {
const buffers = []
const dest = fs.createWriteStream(filename)
response.body.pipe(dest)
response.body.on('data', chunk => buffers.push(chunk))
dest.on('close', () => resolve(Buffer.concat(buffers).toString())
})
)
Method 2: using passthrough streams:
const { PassThrough } = require('stream')
fetch(url).then(
response =>
new Promise(resolve => {
const buffers = []
const dest = fs.createWriteStream(filename)
const forFile = new PassThrough()
const forBuffer = new PassThrough()
response.body.pipe(forFile).pipe(dest)
response.body.pipe(forBuffer)
forBuffer.on('data', chunk => buffers.push(chunk))
dest.on('close', () => resolve(Buffer.concat(buffers).toString())
})
)
Is the second method required so there is no lost data? Is the second method wasteful since two more streams could be buffered? Or, is there another way to fill a buffer and write stream simultaneously?

You won't miss any data, since .pipe internally calls src.on('data') and writes any chunk to the target stream.
So any chunk written to your dest stream, will also be emitted to response.body.on('data') where you're buffering the chunks.
In any case, you should listen to 'error' events and reject if any error occurs.
And While your second mode will work, you don't need it.
This is a chunk of code from the .pipe function
src.on('data', ondata);
function ondata(chunk) {
debug('ondata');
var ret = dest.write(chunk);
debug('dest.write', ret);
if (ret === false) {
// If the user unpiped during `dest.write()`, it is possible
// to get stuck in a permanently paused state if that write
// also returned false.
// => Check whether `dest` is still a piping destination.
if (((state.pipesCount === 1 && state.pipes === dest) ||
(state.pipesCount > 1 && state.pipes.indexOf(dest) !== -1)) &&
!cleanedUp) {
debug('false write response, pause', state.awaitDrain);
state.awaitDrain++;
}
src.pause();
}
}

Related

Create an ongoing stream from buffer and append to the stream

I am receiving a base 64 encode string in my Nodejs server in chunks, and I want to convert it to a stream that can be read by another process but I am not finding how to do it. Currently, I have this code.
const stream = Readable.from(Buffer.from(data, 'base64'));
But this creates a new instance of stream, but what I would like to do it to keep appending to the open stream until no more data is received from my front end. How do I create an appending stream that I can add to and can be read by another process?
--- Additional information --
Client connect to the NodeJS server via websocket. I read the "data" from the payload on the websocket message received.
socket.on('message', async function(res) {
try
{
let payload = JSON.parse(res);
let payloadType = payload['type'];
let data = payload['data'];
---Edit --
I am getting this error message after pushing to the stream.
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
at Readable._read (internal/streams/readable.js:642:9)
at Readable.read (internal/streams/readable.js:481:10)
at maybeReadMore_ (internal/streams/readable.js:629:12)
at processTicksAndRejections (internal/process/task_queues.js:82:21) {
code: 'ERR_METHOD_NOT_IMPLEMENTED'
}
This is the code where I am reading it from, and connected to the stream:
const getAudioStream = async function* () {
for await (const chunk of micStream) {
if (chunk.length <= SAMPLE_RATE) {
yield {
AudioEvent: {
AudioChunk: encodePCMChunk(chunk),
},
};
}
}
};

How do I stream a chunked file using Node.js Readable?

I have a 400Mb file split into chunks that are ~1Mb each.
Each chunk is a MongoDB document:
{
name: 'stuff.zip',
index: 15,
buffer: Binary('......'),
totalChunks: 400
}
I am fetching each chunk from my database and then streaming it to the client.
Every time I get chunk from the DB I push it to the readableStream which is being piped to the client.
Here is the code:
import { Readable } from 'stream'
const name = 'stuff.zip'
const contentType = 'application/zip'
app.get('/api/download-stuff', (req, res) => {
res.set('Content-Type', contentType)
res.set('Content-Disposition', `attachment; filename=${name}`)
res.attachment(name)
// get `totalChunks` from random chunk
let { totalChunks } = await ChunkModel.findOne({ name }).select('totalChunks')
let index = 0
const readableStream = new Readable({
async read() {
if (index < totalChunks) {
let { buffer } = await ChunkModel.findOne({ name, index }).select('buffer')
let canContinue = readableStream.push(buffer)
console.log(`pushed chunk ${index}/${totalChunks}`)
index++
// sometimes it logs false
// which means I should be waiting before pushing more
// but I don't know how
console.log('canContinue = ', canContinue)
} else {
readableStream.push(null)
readableStream.destroy()
console.log(`all ${totalChunks} chunks streamed to the client`)
}
}
})
readableStream.pipe(res)
})
The code works.
But I'm wondering whether I risk having memory overflows on my local server memory, especially when the requests for the same file are too many or the chunks are too many.
Question: My code is not waiting for readableStream to finish reading the chunk that was just pushed to it, before pushing the next one. I thought it was, and that is why I'm using read(){..} in this probably wrong way. So how should I wait for each chunk to be pushed, read, streamed to the client and cleared from my server's local memory, before I push the next one in ?
I have created this sandbox in case it helps anyone
In general, when the readable interface is implemented correctly (i.e., the backpressure signal is respected), the readable interface will prevent the code from overflowing the memory regardless of source size.
When implemented according to the API spec, the readable itself does not keep references for data that has finished passing through the stream. The memory requirement of a readable buffer is adjusted by specifying a highWatermark.
In this case, the snippet does not conform to the readable interface. It violates the following two concepts:
No data shall be pushed to the readable's buffer unless read() has been called. Currently, this implementation proceeds to push data from DB immediately. Consequently, the readable buffer will start to fill before the sink has begun to consume data.
The readable's push() method returns a boolean flag. When the flag is false, the implementation must wait for .read() to be called before pushing additional data. If the flag is ignored, the buffer will overflow wrt. the highWatermark.
Note that ignoring these core criteria of Readables circumvents the backpressure logic.
An alternative implementation, if this is a Mongoose query:
app.get('/api/download-stuff', async (req, res) => {
// ... truncated handler
// A helper variable to relay data from the stream to the response body
const passThrough = new stream.PassThrough({objectMode: false});
// Pipe data using pipeline() to simplify handling stream errors
stream.pipeline(
// Create a cursor that fetch all relevant documents using a single query
ChunkModel.find().limit(chunksLength).select("buffer").sort({index: 1}).lean().cursor(),
// Cherry pick the `buffer` property
new stream.Transform({
objectMode: true,
transform: ({ buffer }, encoding, next) => {
next(null, buffer);
}
}),
// Write the retrieved documents to the helper variable
passThrough,
error => {
if(error){
// Log and handle error. At this point the HTTP headers are probably already sent,
// and it is therefore too late to return HTTP500
}
}
);
res.body = passThrough;
});

How to properly close a writable stream in Node js?

I'm quite new to javascripts. I'm using node js writable stream to write a .txt file; It works well, but I cannot understand how to properly close the file, as its content is blank as long as the program is running. More in detail I need to read from that .txt file after it has been written, but doing it this way returns an empty buffer.
let myWriteStream = fs.createWriteStream("./filepath.txt");
myWriteStream.write(stringBuffer + "\n");
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.end();
// do things..
let data = fs.readFileSync("./filepath.txt").toString().split("\n");
Seems like the event emitted by the .end() method is triggered after the file reading, causing it to be read as empty. If I put a while() to wait for the event to be triggered, so that I know for sure the stream is closed before the reading, the program waits forever.
Do you have any clue of what I'm doing wrong?
your missing 2 things one test that write is succeed
then you need to wait for stream finish event
const { readFileSync, createWriteStream } = require('fs')
const stringBuffer = Buffer.from(readFileSync('index.js')
)
const filePath = "./filepath.txt"
const myWriteStream = createWriteStream(filePath)
let backPressureTest = false;
while (!backPressureTest) {
backPressureTest = myWriteStream.write(stringBuffer + "\n");
}
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.on('finish', () => {
console.log('finish event emitted');
let data = readFileSync(filePath).toString().split("\n");
console.log(data);
});
myWriteStream.end();

NodeJS Stream flushed during the Event Loop iteration

I'm trying to pipe one Stream Axios Response into multiple files. It's not working, and I can reproduce it with the simple code below:
Will work:
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.write('foo')
// Now I have a stream with content
inputStream.pipe(process.stdout)
inputStream.pipe(process.stderr)
// will print 'foofoo', for both stdout and stderr
Will not work:
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.write('foo')
inputStream.pipe(process.stdout)
setImmediate(() => {
inputStream.pipe(process.stderr)
})
// Will print only 'foo'
The question is, Can I say that the existed content in the stream will be piped only if the two pipe commands will execute in the same Event-Loop iteration?
Doesn't that make the situation non-deterministic?
By the time the callback scheduled with setImmediate is executed, the stream data is already flushed. This can checked by .readableLength stream property.
You can use cork and uncork in order to control when the buffered stream data is flushed.
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.cork()
inputStream.write('foo')
inputStream.pipe(process.stdout)
setImmediate(() => {
inputStream.pipe(process.stderr)
inputStream.uncork()
})

Can I don't create a file with writeFileStream?

I create a writeFileStream and pipe it with readableStream.
When on data, I check length of data to if length too short, don't create a file with writeFileStream.
Can I abort create a file with writeFileStream, or unlink the file after file created?
Thanks for your help.
const fs = require('fs')
const { ReadableMock } = require('stream-mock')
const { assert } = require('chai')
describe.only('fs', () => {
const expectedPath = './file.txt'
const input = 'abc'
const reader = new ReadableMock(input)
const writer = fs.createWriteStream(expectedPath)
before((done) => {
let index = 0
reader.pipe(writer)
reader.on('data', () => {
index++
if (index === 1) {
reader.unpipe(writer)
done()
}
})
})
after(() => {
fs.unlinkSync('./file.txt')
})
it('should not create file', () => {
assert.isFalse(fs.existsSync(expectedPath)) // expected true to be false.
})
})
In order to achieve what you're trying to achieve I'd create a PassThrough stream and use highWaterMark to tell me when the stream has been filled - you won't need much code and the streams will give you so little overhead you won't notice (not with writing to disk or reading from HTTP). ;)
Here's what I'd do:
const reader = new ReadableMock(input)
const checker = new PassThrough({
highWaterMark: 4096 // or how many bytes you need to gather first
});
reader
.once('pause', () => checker.pipe(fs.createWriteStream(expectedPath)))
.pipe(checker);
What happens here is:
reader is piped to checker which is not connected to anything, but has it's highWaterMark level of bytes that it allows (you may add encoding there to use chars instead of bytes)
checker is paused, but on pipe reader will unpause and try to write as much as it can
checker will accept some data before returning false on write that will emit pause event on reader
the listener only now creates the writer and it's underlying file and pipes the checker
the checker gets unpaused and so gets reader
If the number of bytes is lower than highWaterMark, pause will not be emitted on reader and so the file won't get created.
Mind you - you may need to close connections and clean up if this is not a mock, otherwise you may leave those hanging and waiting to be read and soon you'll exhaust incoming connection limits or available memory.

Resources