How can I 'accumulate' a raw stream in Node.js? - node.js

At the moment I concatenate everything into a string as follows
var body = '';
res.on('data', function(chunk){
body += chunk;
});
How can I preserve and accumulate the raw stream so I can pass raw bytes to functions that are expecting bytes and not String?

Better use Buffer.concat - much simpler. Available in Node v0.8+.
var chunks = [];
res.on('data', function(chunk) { chunks.push(chunk); });
res.on('end', function() {
var body = Buffer.concat(chunks);
// Do work with body.
});

First off, check that these functions actually need the bytes all in one go. They really should accept 'data' events so that you can just pass on the buffers in the order you receive them.
Anyway, here's a bruteforce way to concatenate all data chunk buffers without decoding them:
var bodyparts = [];
var bodylength = 0;
res.on('data', function(chunk){
bodyparts.push(chunk);
bodylength += chunk.length;
});
res.on('end', function(){
var body = new Buffer(bodylength);
var bodyPos=0;
for (var i=0; i < bodyparts.length; i++) {
bodyparts[i].copy(body, bodyPos, 0, bodyparts[i].length);
bodyPos += bodyparts[i].length;
}
doStuffWith(body); // yay
});

Alternately, you can also use a node.js library like bl or concat-stream:
'use strict'
let http = require('http')
let bl = require('bl')
http.get(url, function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
console.log(data)
}))
})

Related

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

I'm hacking on a Node program that uses smtp-protocol to capture SMTP emails and act on the mail data. The library provides the mail data as a stream, and I don't know how to get that into a string.
I'm currently writing it to stdout with stream.pipe(process.stdout, { end: false }), but as I said, I need the stream data in a string instead, which I can use once the stream has ended.
How do I collect all the data from a Node.js stream into a string?
Another way would be to convert the stream to a promise (refer to the example below) and use then (or await) to assign the resolved value to a variable.
function streamToString (stream) {
const chunks = [];
return new Promise((resolve, reject) => {
stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
stream.on('error', (err) => reject(err));
stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
})
}
const result = await streamToString(stream)
What do you think about this ?
async function streamToString(stream) {
// lets have a ReadableStream as a stream variable
const chunks = [];
for await (const chunk of stream) {
chunks.push(Buffer.from(chunk));
}
return Buffer.concat(chunks).toString("utf-8");
}
None of the above worked for me. I needed to use the Buffer object:
const chunks = [];
readStream.on("data", function (chunk) {
chunks.push(chunk);
});
// Send the buffer or you can put it into a var
readStream.on("end", function () {
res.send(Buffer.concat(chunks));
});
Hope this is more useful than the above answer:
var string = '';
stream.on('data',function(data){
string += data.toString();
console.log('stream data ' + part);
});
stream.on('end',function(){
console.log('final output ' + string);
});
Note that string concatenation is not the most efficient way to collect the string parts, but it is used for simplicity (and perhaps your code does not care about efficiency).
Also, this code may produce unpredictable failures for non-ASCII text (it assumes that every character fits in a byte), but perhaps you do not care about that, either.
(This answer is from years ago, when it was the best answer. There is now a better answer below this. I haven't kept up with node.js, and I cannot delete this answer because it is marked "correct on this question". If you are thinking of down clicking, what do you want me to do?)
The key is to use the data and end events of a Readable Stream. Listen to these events:
stream.on('data', (chunk) => { ... });
stream.on('end', () => { ... });
When you receive the data event, add the new chunk of data to a Buffer created to collect the data.
When you receive the end event, convert the completed Buffer into a string, if necessary. Then do what you need to do with it.
I'm using usually this simple function to transform a stream into a string:
function streamToString(stream, cb) {
const chunks = [];
stream.on('data', (chunk) => {
chunks.push(chunk.toString());
});
stream.on('end', () => {
cb(chunks.join(''));
});
}
Usage example:
let stream = fs.createReadStream('./myFile.foo');
streamToString(stream, (data) => {
console.log(data); // data is now my string variable
});
And yet another one for strings using promises:
function getStream(stream) {
return new Promise(resolve => {
const chunks = [];
# Buffer.from is required if chunk is a String, see comments
stream.on("data", chunk => chunks.push(Buffer.from(chunk)));
stream.on("end", () => resolve(Buffer.concat(chunks).toString()));
});
}
Usage:
const stream = fs.createReadStream(__filename);
getStream(stream).then(r=>console.log(r));
remove the .toString() to use with binary Data if required.
update: #AndreiLED correctly pointed out this has problems with strings. I couldn't get a stream returning strings with the version of node I have, but the api notes this is possible.
From the nodejs documentation you should do this - always remember a string without knowing the encoding is just a bunch of bytes:
var readable = getReadableStreamSomehow();
readable.setEncoding('utf8');
readable.on('data', function(chunk) {
assert.equal(typeof chunk, 'string');
console.log('got %d characters of string data', chunk.length);
})
Easy way with the popular (over 5m weekly downloads) and lightweight get-stream library:
https://www.npmjs.com/package/get-stream
const fs = require('fs');
const getStream = require('get-stream');
(async () => {
const stream = fs.createReadStream('unicorn.txt');
console.log(await getStream(stream)); //output is string
})();
Streams don't have a simple .toString() function (which I understand) nor something like a .toStringAsync(cb) function (which I don't understand).
So I created my own helper function:
var streamToString = function(stream, callback) {
var str = '';
stream.on('data', function(chunk) {
str += chunk;
});
stream.on('end', function() {
callback(str);
});
}
// how to use:
streamToString(myStream, function(myStr) {
console.log(myStr);
});
I had more luck using like that :
let string = '';
readstream
.on('data', (buf) => string += buf.toString())
.on('end', () => console.log(string));
I use node v9.11.1 and the readstream is the response from a http.get callback.
The cleanest solution may be to use the "string-stream" package, which converts a stream to a string with a promise.
const streamString = require('stream-string')
streamString(myStream).then(string_variable => {
// myStream was converted to a string, and that string is stored in string_variable
console.log(string_variable)
}).catch(err => {
// myStream emitted an error event (err), so the promise from stream-string was rejected
throw err
})
What about something like a stream reducer ?
Here is an example using ES6 classes how to use one.
var stream = require('stream')
class StreamReducer extends stream.Writable {
constructor(chunkReducer, initialvalue, cb) {
super();
this.reducer = chunkReducer;
this.accumulator = initialvalue;
this.cb = cb;
}
_write(chunk, enc, next) {
this.accumulator = this.reducer(this.accumulator, chunk);
next();
}
end() {
this.cb(null, this.accumulator)
}
}
// just a test stream
class EmitterStream extends stream.Readable {
constructor(chunks) {
super();
this.chunks = chunks;
}
_read() {
this.chunks.forEach(function (chunk) {
this.push(chunk);
}.bind(this));
this.push(null);
}
}
// just transform the strings into buffer as we would get from fs stream or http request stream
(new EmitterStream(
["hello ", "world !"]
.map(function(str) {
return Buffer.from(str, 'utf8');
})
)).pipe(new StreamReducer(
function (acc, v) {
acc.push(v);
return acc;
},
[],
function(err, chunks) {
console.log(Buffer.concat(chunks).toString('utf8'));
})
);
All the answers listed appear to open the Readable Stream in flowing mode which is not the default in NodeJS and can have limitations since it lacks backpressure support that NodeJS provides in Paused Readable Stream Mode.
Here is an implementation using Just Buffers, Native Stream and Native Stream Transforms and support for Object Mode
import {Transform} from 'stream';
let buffer =null;
function objectifyStream() {
return new Transform({
objectMode: true,
transform: function(chunk, encoding, next) {
if (!buffer) {
buffer = Buffer.from([...chunk]);
} else {
buffer = Buffer.from([...buffer, ...chunk]);
}
next(null, buffer);
}
});
}
process.stdin.pipe(objectifyStream()).process.stdout
This worked for me and is based on Node v6.7.0 docs:
let output = '';
stream.on('readable', function() {
let read = stream.read();
if (read !== null) {
// New stream data is available
output += read.toString();
} else {
// Stream is now finished when read is null.
// You can callback here e.g.:
callback(null, output);
}
});
stream.on('error', function(err) {
callback(err, null);
})
Using the quite popular stream-buffers package which you probably already have in your project dependencies, this is pretty straightforward:
// imports
const { WritableStreamBuffer } = require('stream-buffers');
const { promisify } = require('util');
const { createReadStream } = require('fs');
const pipeline = promisify(require('stream').pipeline);
// sample stream
let stream = createReadStream('/etc/hosts');
// pipeline the stream into a buffer, and print the contents when done
let buf = new WritableStreamBuffer();
pipeline(stream, buf).then(() => console.log(buf.getContents().toString()));
setEncoding('utf8');
Well done Sebastian J above.
I had the "buffer problem" with a few lines of test code I had, and added the encoding information and it solved it, see below.
Demonstrate the problem
software
// process.stdin.setEncoding('utf8');
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
object <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a>
Demonstrate the solution
software
process.stdin.setEncoding('utf8'); // <- Activate!
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
string hello world
In my case, the content type response headers was Content-Type: text/plain. So, I've read the data from Buffer like:
let data = [];
stream.on('data', (chunk) => {
console.log(Buffer.from(chunk).toString())
data.push(Buffer.from(chunk).toString())
});

Why node.js convert POST body?

Want to save JPG binary body data to file system in OpenShift. But somehow received info will get converted. Do you have any idea why? Is it possible that node.js treat data as text and does an encoding / decoding on it?
var myServer = http.createServer(function(request, response)
{
var data = '';
request.on('data', function (chunk){
data += chunk;
});
request.on('end',function(){
var date = new Date();
var url_parts = url.parse(request.url,true);
if(url_parts.pathname == '/setImage') {
if(data != null && data.length > 0) {
fs.writeFile('/var/lib/openshift/555dd1415973ca1660000085/app-root/data/asset/' + url_parts.query.filename, data, 'binary', function(err) {
if (err) throw err
console.log(date + ' File saved. ' + url_parts.query.filename + ' ' + data.length)
response.writeHead(200)
response.end()
})
}
}
You are initializing data with a string, so adding chunk's with += to it will convert the chunks to string as well (which is subject to character encoding).
Instead, you should collect the chunks as an array of Buffer's and use Buffer.concat() to create the final Buffer:
var chunks = [];
request.on('data', function (chunk){
chunks.push(chunk);
});
request.on('end', function() {
var data = Buffer.concat(chunks);
...
});

ReadStream: Internal buffer does not fill up anymore

I have a fs.ReadStream object that points to a pretty big file. Now I would like to read 8000 bytes from the ReadStream, but the internal buffer is only 6000 bytes. So my approach would be to read those 6000 bytes and wait for the internal buffer to fill up again by using a while-loop that checks whether the internal buffer length is not 0 anymore.
Something like this:
BinaryObject.prototype.read = function(length) {
var value;
// Check whether we have enough data in the internal buffer
if (this.stream._readableState.length < length) {
// Not enough data - read the full internal buffer to
// force the ReadStream to fill it again.
value = this.read(this.stream._readableState.length);
while (this.stream._readableState.length === 0) {
// Wait...?
}
// We should have some more data in the internal buffer
// here... Read the rest and add it to our `value` buffer
// ... something like this:
//
// value.push(this.stream.read(length - value.length))
// return value
} else {
value = this.stream.read(length);
this.stream.position += length;
return value;
}
};
The problem is, that the buffer is not filled anymore - the script will just idle in the while loop.
What is the best approach to do this?
It's quite simple. You don't need to do any buffering on your side:
var fs = require('fs'),
rs = fs.createReadStream('/path/to/file');
var CHUNK_SIZE = 8192;
rs.on('readable', function () {
var chunk;
while (null !== (chunk = rs.read(CHUNK_SIZE))) {
console.log('got %d bytes of data', chunk.length);
}
});
rs.on('end', function () {
console.log('end');
});
If CHUNK_SIZE is larger than the internal buffer, node will return null and buffer some more before emitting readable again. You can even configure the initial size of the buffer by passing:
var rs = fs.createReadStream('/path/to/file', {highWatermark: CHUNK_SIZE});
Below is the sample for reading file in streams.
var fs = require('fs'),
readStream = fs.createReadStream(srcPath);
readStream.on('data', function (chunk) {
console.log('got %d bytes of data', chunk.length);
});
readStream.on('readable', function () {
var chunk;
while (null !== (chunk = readStream.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
readStream.on('end', function () {
console.log('got all bytes of data');
});

Combine sequence of asynchronous requests to build final answer

I'm rather new in express framework. I have call flickr api to get albums list and for each album need to get its thumbnail. On the end need to build covers array with list of objects like {title, thumb}. I would like to pass fully created covers array to template and render. I have problem with it because of the way that node.js callbacks works and for loop ends quickly before requests ends. How to do that properly ?
http.get(base_url+'&user_id='+flickr.user_id+'&method=flickr.photosets.getList', function(resp){
var body = '';
resp.on('data', function(chunk) {
body += chunk;
});
resp.on('end', function() {
var json = JSON.parse(body);
var ps = json.photosets.photoset;
// final answer
var covers = {};
for(var i=0; i<ps.length; i++) {
var p = ps[i];
var getSizesUrl = base_url+'&user_id='+flickr.user_id+'&method=flickr.photos.getSizes&photo_id='+p.primary;
http.get(getSizesUrl, function(resp){
var body1 = '';
resp.on('data', function(chunk) {
body1 += chunk;
});
resp.on('end', function() {
var json1 = JSON.parse(body1);
covers += {title: p.title._content, thumb: json1.sizes.size[1].source};
if(i + 1 == ps.length) {
// last call
console.log(covers);
res.render('photosets', {covers: covers});
}
});
});
}
});
});
Update using async and request as #sgwilly said, but something wrong...
request(base_url+'&user_id='+flickr.user_id+'&method=flickr.photosets.getList', function (error, response, body) {
var json = JSON.parse(body);
var ps = json.photosets.photoset;
// functions list to call by `async`
var funcs = {};
for(var i = 0; i < ps.length; i++) {
var p = ps[i];
funcs += function(callback) {
request(base_url+'&user_id='+flickr.user_id+'&method=flickr.photos.getSizes&photo_id='+p.primary, function (error, response, body1){
var json1 = JSON.parse(body1);
var cover = {title: p.title._content, thumb: json1.sizes.size[1].source};
callback(null, cover);
});
};
}
// run requests and produce covers
async.series(funcs,
function(covers){
console.log(covers);
res.render('photosets', {covers: covers});
}
);
});
Node.js callbacks are a little tricky, but they'll make sense after a while.
You want to use request as your outside loop, then async.concat as a callback inside, and have each iterator be a request call of a URL.
Thanks for #dankohn. I got this and work :)
request(base_url+'&user_id='+flickr.user_id+'&method=flickr.photosets.getList', function (error, response, body) {
var json = JSON.parse(body);
var ps = json.photosets.photoset;
async.concat(ps, function(p, callback){
request(base_url+'&user_id='+flickr.user_id+'&method=flickr.photos.getSizes&photo_id='+p.primary, function (error, response, body1){
var json1 = JSON.parse(body1);
var cover = {title: p.title._content, thumb: json1.sizes.size[1].source};
callback(null, cover);
});
},
function(err, covers){
console.log(covers);
res.render('photosets', {covers: covers});
});
});

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};
Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);
Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}
You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)
Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));
You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer
I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.
I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers
I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}
in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}
You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)
You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Resources