ReadStream: Internal buffer does not fill up anymore - node.js

I have a fs.ReadStream object that points to a pretty big file. Now I would like to read 8000 bytes from the ReadStream, but the internal buffer is only 6000 bytes. So my approach would be to read those 6000 bytes and wait for the internal buffer to fill up again by using a while-loop that checks whether the internal buffer length is not 0 anymore.
Something like this:
BinaryObject.prototype.read = function(length) {
var value;
// Check whether we have enough data in the internal buffer
if (this.stream._readableState.length < length) {
// Not enough data - read the full internal buffer to
// force the ReadStream to fill it again.
value = this.read(this.stream._readableState.length);
while (this.stream._readableState.length === 0) {
// Wait...?
}
// We should have some more data in the internal buffer
// here... Read the rest and add it to our `value` buffer
// ... something like this:
//
// value.push(this.stream.read(length - value.length))
// return value
} else {
value = this.stream.read(length);
this.stream.position += length;
return value;
}
};
The problem is, that the buffer is not filled anymore - the script will just idle in the while loop.
What is the best approach to do this?

It's quite simple. You don't need to do any buffering on your side:
var fs = require('fs'),
rs = fs.createReadStream('/path/to/file');
var CHUNK_SIZE = 8192;
rs.on('readable', function () {
var chunk;
while (null !== (chunk = rs.read(CHUNK_SIZE))) {
console.log('got %d bytes of data', chunk.length);
}
});
rs.on('end', function () {
console.log('end');
});
If CHUNK_SIZE is larger than the internal buffer, node will return null and buffer some more before emitting readable again. You can even configure the initial size of the buffer by passing:
var rs = fs.createReadStream('/path/to/file', {highWatermark: CHUNK_SIZE});

Below is the sample for reading file in streams.
var fs = require('fs'),
readStream = fs.createReadStream(srcPath);
readStream.on('data', function (chunk) {
console.log('got %d bytes of data', chunk.length);
});
readStream.on('readable', function () {
var chunk;
while (null !== (chunk = readStream.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
readStream.on('end', function () {
console.log('got all bytes of data');
});

Related

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

I'm hacking on a Node program that uses smtp-protocol to capture SMTP emails and act on the mail data. The library provides the mail data as a stream, and I don't know how to get that into a string.
I'm currently writing it to stdout with stream.pipe(process.stdout, { end: false }), but as I said, I need the stream data in a string instead, which I can use once the stream has ended.
How do I collect all the data from a Node.js stream into a string?
Another way would be to convert the stream to a promise (refer to the example below) and use then (or await) to assign the resolved value to a variable.
function streamToString (stream) {
const chunks = [];
return new Promise((resolve, reject) => {
stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
stream.on('error', (err) => reject(err));
stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
})
}
const result = await streamToString(stream)
What do you think about this ?
async function streamToString(stream) {
// lets have a ReadableStream as a stream variable
const chunks = [];
for await (const chunk of stream) {
chunks.push(Buffer.from(chunk));
}
return Buffer.concat(chunks).toString("utf-8");
}
None of the above worked for me. I needed to use the Buffer object:
const chunks = [];
readStream.on("data", function (chunk) {
chunks.push(chunk);
});
// Send the buffer or you can put it into a var
readStream.on("end", function () {
res.send(Buffer.concat(chunks));
});
Hope this is more useful than the above answer:
var string = '';
stream.on('data',function(data){
string += data.toString();
console.log('stream data ' + part);
});
stream.on('end',function(){
console.log('final output ' + string);
});
Note that string concatenation is not the most efficient way to collect the string parts, but it is used for simplicity (and perhaps your code does not care about efficiency).
Also, this code may produce unpredictable failures for non-ASCII text (it assumes that every character fits in a byte), but perhaps you do not care about that, either.
(This answer is from years ago, when it was the best answer. There is now a better answer below this. I haven't kept up with node.js, and I cannot delete this answer because it is marked "correct on this question". If you are thinking of down clicking, what do you want me to do?)
The key is to use the data and end events of a Readable Stream. Listen to these events:
stream.on('data', (chunk) => { ... });
stream.on('end', () => { ... });
When you receive the data event, add the new chunk of data to a Buffer created to collect the data.
When you receive the end event, convert the completed Buffer into a string, if necessary. Then do what you need to do with it.
I'm using usually this simple function to transform a stream into a string:
function streamToString(stream, cb) {
const chunks = [];
stream.on('data', (chunk) => {
chunks.push(chunk.toString());
});
stream.on('end', () => {
cb(chunks.join(''));
});
}
Usage example:
let stream = fs.createReadStream('./myFile.foo');
streamToString(stream, (data) => {
console.log(data); // data is now my string variable
});
And yet another one for strings using promises:
function getStream(stream) {
return new Promise(resolve => {
const chunks = [];
# Buffer.from is required if chunk is a String, see comments
stream.on("data", chunk => chunks.push(Buffer.from(chunk)));
stream.on("end", () => resolve(Buffer.concat(chunks).toString()));
});
}
Usage:
const stream = fs.createReadStream(__filename);
getStream(stream).then(r=>console.log(r));
remove the .toString() to use with binary Data if required.
update: #AndreiLED correctly pointed out this has problems with strings. I couldn't get a stream returning strings with the version of node I have, but the api notes this is possible.
From the nodejs documentation you should do this - always remember a string without knowing the encoding is just a bunch of bytes:
var readable = getReadableStreamSomehow();
readable.setEncoding('utf8');
readable.on('data', function(chunk) {
assert.equal(typeof chunk, 'string');
console.log('got %d characters of string data', chunk.length);
})
Easy way with the popular (over 5m weekly downloads) and lightweight get-stream library:
https://www.npmjs.com/package/get-stream
const fs = require('fs');
const getStream = require('get-stream');
(async () => {
const stream = fs.createReadStream('unicorn.txt');
console.log(await getStream(stream)); //output is string
})();
Streams don't have a simple .toString() function (which I understand) nor something like a .toStringAsync(cb) function (which I don't understand).
So I created my own helper function:
var streamToString = function(stream, callback) {
var str = '';
stream.on('data', function(chunk) {
str += chunk;
});
stream.on('end', function() {
callback(str);
});
}
// how to use:
streamToString(myStream, function(myStr) {
console.log(myStr);
});
I had more luck using like that :
let string = '';
readstream
.on('data', (buf) => string += buf.toString())
.on('end', () => console.log(string));
I use node v9.11.1 and the readstream is the response from a http.get callback.
The cleanest solution may be to use the "string-stream" package, which converts a stream to a string with a promise.
const streamString = require('stream-string')
streamString(myStream).then(string_variable => {
// myStream was converted to a string, and that string is stored in string_variable
console.log(string_variable)
}).catch(err => {
// myStream emitted an error event (err), so the promise from stream-string was rejected
throw err
})
What about something like a stream reducer ?
Here is an example using ES6 classes how to use one.
var stream = require('stream')
class StreamReducer extends stream.Writable {
constructor(chunkReducer, initialvalue, cb) {
super();
this.reducer = chunkReducer;
this.accumulator = initialvalue;
this.cb = cb;
}
_write(chunk, enc, next) {
this.accumulator = this.reducer(this.accumulator, chunk);
next();
}
end() {
this.cb(null, this.accumulator)
}
}
// just a test stream
class EmitterStream extends stream.Readable {
constructor(chunks) {
super();
this.chunks = chunks;
}
_read() {
this.chunks.forEach(function (chunk) {
this.push(chunk);
}.bind(this));
this.push(null);
}
}
// just transform the strings into buffer as we would get from fs stream or http request stream
(new EmitterStream(
["hello ", "world !"]
.map(function(str) {
return Buffer.from(str, 'utf8');
})
)).pipe(new StreamReducer(
function (acc, v) {
acc.push(v);
return acc;
},
[],
function(err, chunks) {
console.log(Buffer.concat(chunks).toString('utf8'));
})
);
All the answers listed appear to open the Readable Stream in flowing mode which is not the default in NodeJS and can have limitations since it lacks backpressure support that NodeJS provides in Paused Readable Stream Mode.
Here is an implementation using Just Buffers, Native Stream and Native Stream Transforms and support for Object Mode
import {Transform} from 'stream';
let buffer =null;
function objectifyStream() {
return new Transform({
objectMode: true,
transform: function(chunk, encoding, next) {
if (!buffer) {
buffer = Buffer.from([...chunk]);
} else {
buffer = Buffer.from([...buffer, ...chunk]);
}
next(null, buffer);
}
});
}
process.stdin.pipe(objectifyStream()).process.stdout
This worked for me and is based on Node v6.7.0 docs:
let output = '';
stream.on('readable', function() {
let read = stream.read();
if (read !== null) {
// New stream data is available
output += read.toString();
} else {
// Stream is now finished when read is null.
// You can callback here e.g.:
callback(null, output);
}
});
stream.on('error', function(err) {
callback(err, null);
})
Using the quite popular stream-buffers package which you probably already have in your project dependencies, this is pretty straightforward:
// imports
const { WritableStreamBuffer } = require('stream-buffers');
const { promisify } = require('util');
const { createReadStream } = require('fs');
const pipeline = promisify(require('stream').pipeline);
// sample stream
let stream = createReadStream('/etc/hosts');
// pipeline the stream into a buffer, and print the contents when done
let buf = new WritableStreamBuffer();
pipeline(stream, buf).then(() => console.log(buf.getContents().toString()));
setEncoding('utf8');
Well done Sebastian J above.
I had the "buffer problem" with a few lines of test code I had, and added the encoding information and it solved it, see below.
Demonstrate the problem
software
// process.stdin.setEncoding('utf8');
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
object <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a>
Demonstrate the solution
software
process.stdin.setEncoding('utf8'); // <- Activate!
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
string hello world
In my case, the content type response headers was Content-Type: text/plain. So, I've read the data from Buffer like:
let data = [];
stream.on('data', (chunk) => {
console.log(Buffer.from(chunk).toString())
data.push(Buffer.from(chunk).toString())
});

NodeJS - Read Buffer line by line synchronously => toString() failed

I have been struggeling, and searching for a long time. I know there are answers about that, but none works.
I used fs.createReadStream and readLine for this. But It's using fs.close() to close FILE READING. so it doesnt work at all when used on a buffer. The reading on all files goes on without possiblity to interrupt it...
Then I used this :
const stream = require('stream');
let bufferStream = new stream.PassThrough();
bufferStream.end(hexaviaFile.buffer);
bufferStream
.pipe(require('split')())
.pipe(es.mapSync(function(line){
// pause the readstream
bufferStream.pause();
// DO WHATEVER WITH YOUR LINE
console.log('line content = ' + line);
// resume the readstream, possibly from a callback
bufferStream.resume();
}).on('error', function(err){
console.log('Error while reading file.' + err);
}).on('end', function(){
console.log('end event !');
}).on('close', function(){
console.log('close event !');
})
);
// toString() Failed
I get the [toString() Failed] error and searched about it, apparently it appears when the buffer is large than node buffer max size.
So I checked :
var buffer = require('buffer');
console.log('buffer.kMaxLength = ', buffer.kMaxLength); // 2147483647
console.log('hexaviaFile.buffer.byteLength = ', hexaviaFile.buffer.byteLength); // => 413567671
It's not the case as you can see numbers provided:
* maxBuffer size = 2Go
* my buffer = 0.4Go
I also tried some diffeent library to do so but:
1. I want to keep memory usage as low as possible
2. I need this reading to be perfectly SYNC. In other words, I have some processings after the file reading and I need to complete all the reading before going to next steps.
I don't know what to do :) Any kind (of) help appreciated
Regards.
I forgot about this post. I found a way to achieve this without errors.
It's given here : https://github.com/request/request/issues/2826
1st create a splitter to read string chunks
class Splitter extends Transform {
constructor(options){
super(options);
this.splitSize = options.splitSize;
this.buffer = Buffer.alloc(0);
this.continueThis = true;
}
stopIt() {
this.continueThis = false;
}
_transform(chunk, encoding, cb){
this.buffer = Buffer.concat([this.buffer, chunk]);
while ((this.buffer.length > this.splitSize || this.buffer.length === 1) && this.continueThis){
try {
let chunk = this.buffer.slice(0, this.splitSize);
this.push(chunk);
this.buffer = this.buffer.slice(this.splitSize);
if (this.buffer[0] === 26){
console.log('EOF : ' + this.buffer[0]);
}
} catch (err) {
console.log('ERR OCCURED => ', err);
break;
}
}
console.log('WHILE FINISHED');
cb();
}
}
Then pipe it to your stream :
let bufferStream = new stream.PassThrough();
bufferStream.end(hugeBuffer);
let splitter = new Splitter({splitSize : 170}); // In my case I have 170 length lines, so I want to process them line by line
let lineNr = 0;
bufferStream
.pipe(splitter)
.on('data', async function(line){
line = line.toString().trim();
splitter.pause(); // pause stream so you can perform long time processing with await
lineNr++;
if (lineNr === 1){
// DO stuff with 1st line
} else {
splitter.stopIt(); // Break the stream and stop reading so we just read 1st line
}
splitter.resume() // resumestream so you can process next chunk
}).on('error', function(err){
console.log('Error while reading file.' + err);
// whatever
}).on('end', async function(){
console.log('end event');
// Stream has ended, do whatever...
});

node.js - reading child process stdout 100 bytes at a time

I'm spawning a child that produces lots of data (I'm using 'ls -lR /' here as an example). I want to asynchronously read the child's stdout 100 bytes at a time.
So I want to do: get100().then(process100).then(get100).then(process100).then(...
For some reason, this code only loops 3 times and I stop getting Readable events. I can't figure out why?
var Promise = require('bluebird');
var spawn = require("child_process").spawn;
var exec = spawn( "ls", [ "-lR", "/"] );
var get100 = function () {
return new Promise(function(resolve, reject) {
var tryTransfer = function() {
var block = exec.stdout.read(100);
if (block) {
console.log("Got 100 Bytes");
exec.stdout.removeAllListeners('readable');
resolve();
} else console.log("Read Failed - not enough bytes?");
};
exec.stdout.on('readable', tryTransfer);
});
};
var forEver = Promise.method(function(action) {
return action().then(forEver.bind(null, action));
});
forEver(
function() { return get100(); }
)
Using event-stream, you can emit 100 bytes data from the spawned process as long as there is data to read (streams are async):
var es = require('event-stream');
var spawn = require("child_process").spawn;
var exec = spawn("ls", ["-lR", "/"]);
var stream = es.readable(function (count, next) {
// read 100 bytes
while (block = exec.stdout.read(100)) {
// if you have tons of data, it's not a good idea to log here
// console.log("Got 100 Bytes");
// emit the block
this.emit('data', block.toString()); // block is a buffer (bytes array), you may need toString() or not
}
// no more data left to read
this.emit('end');
next();
}).on('data', function(data) {
// data is the 100 bytes block, do what you want here
// the stream is pausable and resumable at will
stream.pause();
doStuff(data, function() {
stream.resume();
});
});

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};
Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);
Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}
You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)
Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));
You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer
I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.
I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers
I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}
in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}
You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)
You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

How can I 'accumulate' a raw stream in Node.js?

At the moment I concatenate everything into a string as follows
var body = '';
res.on('data', function(chunk){
body += chunk;
});
How can I preserve and accumulate the raw stream so I can pass raw bytes to functions that are expecting bytes and not String?
Better use Buffer.concat - much simpler. Available in Node v0.8+.
var chunks = [];
res.on('data', function(chunk) { chunks.push(chunk); });
res.on('end', function() {
var body = Buffer.concat(chunks);
// Do work with body.
});
First off, check that these functions actually need the bytes all in one go. They really should accept 'data' events so that you can just pass on the buffers in the order you receive them.
Anyway, here's a bruteforce way to concatenate all data chunk buffers without decoding them:
var bodyparts = [];
var bodylength = 0;
res.on('data', function(chunk){
bodyparts.push(chunk);
bodylength += chunk.length;
});
res.on('end', function(){
var body = new Buffer(bodylength);
var bodyPos=0;
for (var i=0; i < bodyparts.length; i++) {
bodyparts[i].copy(body, bodyPos, 0, bodyparts[i].length);
bodyPos += bodyparts[i].length;
}
doStuffWith(body); // yay
});
Alternately, you can also use a node.js library like bl or concat-stream:
'use strict'
let http = require('http')
let bl = require('bl')
http.get(url, function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
console.log(data)
}))
})

Resources