Node.js: How to read a stream into a buffer?

Node.js: How to read a stream into a buffer? - node.js

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};

Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);

Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}

You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)

Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));

You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer

I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}

in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}

You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)

You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Related

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

I'm hacking on a Node program that uses smtp-protocol to capture SMTP emails and act on the mail data. The library provides the mail data as a stream, and I don't know how to get that into a string.
I'm currently writing it to stdout with stream.pipe(process.stdout, { end: false }), but as I said, I need the stream data in a string instead, which I can use once the stream has ended.
How do I collect all the data from a Node.js stream into a string?

Another way would be to convert the stream to a promise (refer to the example below) and use then (or await) to assign the resolved value to a variable.
function streamToString (stream) {
const chunks = [];
return new Promise((resolve, reject) => {
stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
stream.on('error', (err) => reject(err));
stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
})
}
const result = await streamToString(stream)

What do you think about this ?
async function streamToString(stream) {
// lets have a ReadableStream as a stream variable
const chunks = [];
for await (const chunk of stream) {
chunks.push(Buffer.from(chunk));
}
return Buffer.concat(chunks).toString("utf-8");
}

None of the above worked for me. I needed to use the Buffer object:
const chunks = [];
readStream.on("data", function (chunk) {
chunks.push(chunk);
});
// Send the buffer or you can put it into a var
readStream.on("end", function () {
res.send(Buffer.concat(chunks));
});

Hope this is more useful than the above answer:
var string = '';
stream.on('data',function(data){
string += data.toString();
console.log('stream data ' + part);
});
stream.on('end',function(){
console.log('final output ' + string);
});
Note that string concatenation is not the most efficient way to collect the string parts, but it is used for simplicity (and perhaps your code does not care about efficiency).
Also, this code may produce unpredictable failures for non-ASCII text (it assumes that every character fits in a byte), but perhaps you do not care about that, either.

(This answer is from years ago, when it was the best answer. There is now a better answer below this. I haven't kept up with node.js, and I cannot delete this answer because it is marked "correct on this question". If you are thinking of down clicking, what do you want me to do?)
The key is to use the data and end events of a Readable Stream. Listen to these events:
stream.on('data', (chunk) => { ... });
stream.on('end', () => { ... });
When you receive the data event, add the new chunk of data to a Buffer created to collect the data.
When you receive the end event, convert the completed Buffer into a string, if necessary. Then do what you need to do with it.

I'm using usually this simple function to transform a stream into a string:
function streamToString(stream, cb) {
const chunks = [];
stream.on('data', (chunk) => {
chunks.push(chunk.toString());
});
stream.on('end', () => {
cb(chunks.join(''));
});
}
Usage example:
let stream = fs.createReadStream('./myFile.foo');
streamToString(stream, (data) => {
console.log(data); // data is now my string variable
});

And yet another one for strings using promises:
function getStream(stream) {
return new Promise(resolve => {
const chunks = [];
# Buffer.from is required if chunk is a String, see comments
stream.on("data", chunk => chunks.push(Buffer.from(chunk)));
stream.on("end", () => resolve(Buffer.concat(chunks).toString()));
});
}
Usage:
const stream = fs.createReadStream(__filename);
getStream(stream).then(r=>console.log(r));
remove the .toString() to use with binary Data if required.
update: #AndreiLED correctly pointed out this has problems with strings. I couldn't get a stream returning strings with the version of node I have, but the api notes this is possible.

From the nodejs documentation you should do this - always remember a string without knowing the encoding is just a bunch of bytes:
var readable = getReadableStreamSomehow();
readable.setEncoding('utf8');
readable.on('data', function(chunk) {
assert.equal(typeof chunk, 'string');
console.log('got %d characters of string data', chunk.length);
})

Easy way with the popular (over 5m weekly downloads) and lightweight get-stream library:
https://www.npmjs.com/package/get-stream
const fs = require('fs');
const getStream = require('get-stream');
(async () => {
const stream = fs.createReadStream('unicorn.txt');
console.log(await getStream(stream)); //output is string
})();

Streams don't have a simple .toString() function (which I understand) nor something like a .toStringAsync(cb) function (which I don't understand).
So I created my own helper function:
var streamToString = function(stream, callback) {
var str = '';
stream.on('data', function(chunk) {
str += chunk;
});
stream.on('end', function() {
callback(str);
});
}
// how to use:
streamToString(myStream, function(myStr) {
console.log(myStr);
});

I had more luck using like that :
let string = '';
readstream
.on('data', (buf) => string += buf.toString())
.on('end', () => console.log(string));
I use node v9.11.1 and the readstream is the response from a http.get callback.

The cleanest solution may be to use the "string-stream" package, which converts a stream to a string with a promise.
const streamString = require('stream-string')
streamString(myStream).then(string_variable => {
// myStream was converted to a string, and that string is stored in string_variable
console.log(string_variable)
}).catch(err => {
// myStream emitted an error event (err), so the promise from stream-string was rejected
throw err
})

What about something like a stream reducer ?
Here is an example using ES6 classes how to use one.
var stream = require('stream')
class StreamReducer extends stream.Writable {
constructor(chunkReducer, initialvalue, cb) {
super();
this.reducer = chunkReducer;
this.accumulator = initialvalue;
this.cb = cb;
}
_write(chunk, enc, next) {
this.accumulator = this.reducer(this.accumulator, chunk);
next();
}
end() {
this.cb(null, this.accumulator)
}
}
// just a test stream
class EmitterStream extends stream.Readable {
constructor(chunks) {
super();
this.chunks = chunks;
}
_read() {
this.chunks.forEach(function (chunk) {
this.push(chunk);
}.bind(this));
this.push(null);
}
}
// just transform the strings into buffer as we would get from fs stream or http request stream
(new EmitterStream(
["hello ", "world !"]
.map(function(str) {
return Buffer.from(str, 'utf8');
})
)).pipe(new StreamReducer(
function (acc, v) {
acc.push(v);
return acc;
},
[],
function(err, chunks) {
console.log(Buffer.concat(chunks).toString('utf8'));
})
);

All the answers listed appear to open the Readable Stream in flowing mode which is not the default in NodeJS and can have limitations since it lacks backpressure support that NodeJS provides in Paused Readable Stream Mode.
Here is an implementation using Just Buffers, Native Stream and Native Stream Transforms and support for Object Mode
import {Transform} from 'stream';
let buffer =null;
function objectifyStream() {
return new Transform({
objectMode: true,
transform: function(chunk, encoding, next) {
if (!buffer) {
buffer = Buffer.from([...chunk]);
} else {
buffer = Buffer.from([...buffer, ...chunk]);
}
next(null, buffer);
}
});
}
process.stdin.pipe(objectifyStream()).process.stdout

This worked for me and is based on Node v6.7.0 docs:
let output = '';
stream.on('readable', function() {
let read = stream.read();
if (read !== null) {
// New stream data is available
output += read.toString();
} else {
// Stream is now finished when read is null.
// You can callback here e.g.:
callback(null, output);
}
});
stream.on('error', function(err) {
callback(err, null);
})

Using the quite popular stream-buffers package which you probably already have in your project dependencies, this is pretty straightforward:
// imports
const { WritableStreamBuffer } = require('stream-buffers');
const { promisify } = require('util');
const { createReadStream } = require('fs');
const pipeline = promisify(require('stream').pipeline);
// sample stream
let stream = createReadStream('/etc/hosts');
// pipeline the stream into a buffer, and print the contents when done
let buf = new WritableStreamBuffer();
pipeline(stream, buf).then(() => console.log(buf.getContents().toString()));

setEncoding('utf8');
Well done Sebastian J above.
I had the "buffer problem" with a few lines of test code I had, and added the encoding information and it solved it, see below.
Demonstrate the problem
software
// process.stdin.setEncoding('utf8');
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
object <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a>
Demonstrate the solution
software
process.stdin.setEncoding('utf8'); // <- Activate!
process.stdin.on('data', (data) => {
console.log(typeof(data), data);
});
input
hello world
output
string hello world

In my case, the content type response headers was Content-Type: text/plain. So, I've read the data from Buffer like:
let data = [];
stream.on('data', (chunk) => {
console.log(Buffer.from(chunk).toString())
data.push(Buffer.from(chunk).toString())
});

How to disconnect a socket after streaming data?

I am making use of "socket.io-client" and "socket.io stream" to make a request and then stream some data. I have the following code that handles this logic
Client Server Logic
router.get('/writeData', function(req, res) {
var io = req.app.get('socketio');
var nameNodeSocket = io.connect(NAMENODE_ADDRESS, { reconnect: true });
var nameNodeData = {};
async.waterfall([
checkForDataNodes,
readFileFromS3
], function(err, result) {
if (err !== null) {
res.json(err);
}else{
res.json("Finished Writing to DN's");
}
});
function checkForDataNodes(cb) {
nameNodeSocket.on('nameNodeData', function(data) {
nameNodeData = data;
console.log(nameNodeData);
cb(null, nameNodeData);
});
if (nameNodeData.numDataNodes === 0) {
cb("No datanodes found");
}
}
function readFileFromS3(nameNodeData, cb) {
for (var i in nameNodeData['blockToDataNodes']) {
var IP = nameNodeData['blockToDataNodes'][i]['ipValue'];
var dataNodeSocket = io.connect('http://'+ IP +":5000");
var ss = require("socket.io-stream");
var stream = ss.createStream();
var byteStartRange = nameNodeData['blockToDataNodes'][i]['byteStart'];
var byteStopRange = nameNodeData['blockToDataNodes'][i]['byteStop'];
paramsWithRange['Range'] = "bytes=" + byteStartRange.toString() + "-" + byteStopRange.toString();
//var file = require('fs').createWriteStream('testFile' + i + '.txt');
var getFileName = nameNodeData['blockToDataNodes'][i]['key'].split('/');
var fileData = {
'mainFile': paramsWithRange['Key'].split('/')[1],
'blockName': getFileName[1]
};
ss(dataNodeSocket).emit('sendData', stream, fileData);
s3.getObject(paramsWithRange).createReadStream().pipe(stream);
//dataNodeSocket.disconnect();
}
cb(null);
}
});
Server Logic (that gets the data)
var dataNodeIO = require('socket.io')(server);
var ss = require("socket.io-stream");
dataNodeIO.on('connection', function(socket) {
console.log("Succesfully connected!");
ss(socket).on('sendData', function(stream, data) {
var IP = data['ipValue'];
var blockName = data['blockName'];
var mainFile = data['mainFile'];
dataNode.makeDir(mainFile);
dataNode.addToReport(mainFile, blockName);
stream.pipe(fs.createWriteStream(mainFile + '/' + blockName));
});
});
How can I properly disconnect the connections in function readFileFromS3. I have noticed using dataNodeSocket.disconnect() at the end does not work as I cannot verify the data was received on the 2nd server. But if I comment it out, I can see the data being streamed to the second server.
My objective is to close the connections in Client Server side

It appears that the main problem with closing the socket is that you weren't waiting for the stream to be done writing before trying to close the socket. So, because the writing is all asynchronous and finishes sometime later, you were trying to close the socket before the data had been written.
Also because you were putting asynchronous operations inside a for loop, you were also running all your operations in parallel which may not be exactly what you want as it makes error handling more difficult and server load more difficult.
Here's the code I would suggest that does the following:
Create a function streamFileFromS3() that streams a single file and returns a promise that will notify when it's done.
Use await in a for loop with that streamFileFromS3() to serialize the operations. You don't have to serialize them, but then you would have to change your error handling to figure out what to do if one errors while the others are already running and you'd have to be more careful about concurrency issues.
Use try/catch to catch any errors from streamFileFromS3().
Add error handling on the stream.
Change all occurrences of data['propertyName'] to data.propertyName. The only time you need to use brackets is if the property name contains a character that is not allowed in a Javascript identifier or if the property name is in a variable. Otherwise, the dot notation is preferred.
Add socket.io connection error handling logic for both socket.io connections.
Set returned status to 500 when there's an error processing the request
So, here's the code for that:
const ss = require("socket.io-stream");
router.get('/writeData', function(req, res) {
const io = req.app.get('socketio');
function streamFileFromS3(ip, data) {
return new Promise((resolve, reject) => {
const dataNodeSocket = io.connect(`http://${ip}:5000`);
dataNodeSocket.on('connect_error', reject);
dataNodeSocket.on('connect_timeout', () {
reject(new Error(`timeout connecting to http://${ip}:5000`));
});
dataNodeSocket.on('connection', () => {
// dataNodeSocket connected now
const stream = ss.createStream().on('error', reject);
paramsWithRange.Range = `bytes=${data.byteStart}-${data.byteStop}`;
const filename = data.key.split('/')[1];
const fileData = {
'mainFile': paramsWithRange.Key.split('/')[1],
'blockName': filename
};
ss(dataNodeSocket).emit('sendData', stream, fileData);
// get S3 data and pipe it to the socket.io stream
s3.getObject(paramsWithRange).createReadStream().on('error', reject).pipe(stream);
stream.on('close', () => {
dataNodeSocket.disconnect();
resolve();
});
});
});
}
function connectError(msg) {
res.status(500).send(`Error connecting to ${NAMENODE_ADDRESS}`);
}
const nameNodeSocket = io.connect(NAMENODE_ADDRESS, { reconnect: true });
nameNodeSocket.on('connect_error', connectError).on('connect_timeout', connectError);
nameNodeSocket.on('nameNodeData', async (nameNodeData) => {
try {
for (let item of nameNodeData.blockToDataNodes) {
await streamFileFromS3(item.ipValue, item);
}
res.json("Finished Writing to DN's");
} catch(e) {
res.status(500).json(e);
}
});
});
Other notes:
I don't know what paramsWithRange is as it is not declared here and when you were doing everything in parallel, it was getting shared among all the connections which is asking for a concurrency issue. In my serialized implementation, it's probably safe to share it, but the way it is now bothers me as it's a concurrency issue waiting to happen.

Node Async ReadStream from SFTP connection

So I'm creating a class and ultimately want to create a method that takes a file on an SFTP server and produces a readstream that can be piped into other streams / functions. I'm most of the way there, except my readStream is acting strangely. Here's the relevant code:
const Client = require('ssh2').Client,
Readable = require('stream').Readable,
async = require('async');
/**
* Class Definition stuff
* ......
*/
getStream(get) {
const self = this;
const rs = new Readable;
rs._read = function() {
const read = this;
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
let bufferSize = stats.size,
chunkSize = 512,//bytes
buffer = new Buffer(bufferSize),
bytesRead = 0;
async.whilst(
function () {
return bytesRead < bufferSize;
},
function (done) {
sftp.read(fd, buffer, bytesRead, chunkSize, bytesRead,
function (err, bytes, buff) {
if (err) return done(err);
// console.log(buff.toString('utf8'));
read.push(buff);
bytesRead += bytes;
done();
});
},
function (err) {
if (err) console.log(err);
read.push(null);
sftp.close(fd);
}
);
});
});
});
}).connect(self.connectionObj);
}
return rs;
}
Elsewhere, I would call this method like so:
let sftp = new SFTP(credentials);
sftp.getStream('/path/file.csv')
.pipe(toStuff);
.pipe(toOutput);
So, long story short. During the SFTP.read operation read.push(buff) keeps pushing the same first part of the file over and over. However, when I console.log(buff) it correctly streams the full file?
So I'm scratching my head wondering what I'm doing wrong with the read stream that it's only pushing the beginning of the file and not continuing on to the next part of the buffer.
Here's the docs on SSH2 SFTP client: https://github.com/mscdex/ssh2-streams/blob/master/SFTPStream.md
I used this SO question as inspiration for what I wrote above: node.js fs.read() example
This is similar/related: Reading file from SFTP server using Node.js and SSH2

Ok, after a lot of trouble, I realized I was making a couple mistakes. First, the _read function is called every time the stream is ready to read more data, which means, the SFTP connection was being started everytime _read was called. This also meant the sftp.read() function was starting over each time, reseting the starting point back to the beginning.
I needed a way to first setup the connection, then read and stream the file data, so I chose the library noms. Here's the final code if anyone is interested:
getStream (get) {
const self = this;
let connection,
fileData,
buffer,
totalBytes = 0,
bytesRead = 0;
return nom(
// _read function
function(size, next) {
const read = this;
// Check if we're done reading
if(bytesRead === totalBytes) {
connection.close(fileData);
connection.end();
self.conn.end();
console.log('done');
return read.push(null);
}
// Make sure we read the last bit of the file
if ((bytesRead + size) > totalBytes) {
size = (totalBytes - bytesRead);
}
// Read each chunk of the file
connection.read(fileData, buffer, bytesRead, size, bytesRead,
function (err, byteCount, buff, pos) {
// console.log(buff.toString('utf8'));
// console.log('reading');
bytesRead += byteCount;
read.push(buff);
next();
}
);
},
// Before Function
function(start) {
// setup the connection BEFORE we start _read
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
connection = sftp;
fileData = fd;
totalBytes = stats.size;
buffer = new Buffer(totalBytes);
console.log('made connection');
start();
});
});
});
}).connect(self.connectionObj);
})
}
Always looking for feedback. This doesn't run quite as fast as I'd hope, so let me know if you have ideas on speeding up the stream.

Nodejs: How to send a readable stream to the browser

If I query the box REST API and get back a readable stream, what is the best way to handle it? How do you send it to the browser?? (DISCLAIMER: I'm new to streams and buffers, so some of this code is pretty theoretical)
Can you pass the readStream in the response and let the browser handle it? Or do you have to stream the chunks into a buffer and then send the buffer??
export function getFileStream(req, res) {
const fileId = req.params.fileId;
console.log('fileId', fileId);
req.sdk.files.getReadStream(fileId, null, (err, stream) => {
if (err) {
console.log('error', err);
return res.status(500).send(err);
}
res.type('application/octet-stream');
console.log('stream', stream);
return res.status(200).send(stream);
});
}
Will ^^ work, or do you need to do something like:
export function downloadFile(req, res) {
const fileId = req.params.fileId;
console.log('fileId', fileId);
req.sdk.files.getReadStream(fileId, null, (err, stream) => {
if (err) {
console.log('error', err);
return res.status(500).send(err);
}
const buffers = [];
const document = new Buffer();
console.log('stream', stream);
stream.on('data', (chunk) => {
buffers.push(buffer);
})
.on('end', function(){
const finalBuffer = Buffer.concat(buffers);
return res.status(200).send(finalBuffer);
});
});
}

The first example would work if you changed you theoretical line to:
- return res.status(200).send(stream);
+ res.writeHead(200, {header: here})
+ stream.pipe(res);
That's the nicest thing about node stream. The other case would (in essence) work too, but it would accumulate lots of unnecessary memory.
If you'd like to check a working example, here's one I wrote based on scramjet, express and browserify:
https://github.com/MichalCz/scramjet/blob/master/samples/browser/browser.js
Where your streams go from the server to the browser. With minor mods it'll fit your problem.

TDD/ testing with streams in NodeJS

I've been trying to find a reasonable way to test code that uses streams. Has anyone found a reasonable way/ framework to help testing code that uses streams in nodejs?
For example:
var fs = require('fs'),
request = require('request');
module.exports = function (url, path, callback) {
request(url)
.pipe(fs.createWriteStream(path))
.on('finish', function () {
callback();
});
};
My current way of testing this type of code either involves simplifying the code with streams so much that I can abstract it out to a non-tested chunk of code or by writing something like this:
var rewire = require('rewire'),
download = rewire('../lib/download'),
stream = require('stream'),
util = require('util');
describe('download', function () {
it('should download a url', function (done) {
var fakeRequest, fakeFs, FakeStream;
FakeStream = function () {
stream.Writable.call(this);
};
util.inherits(FakeStream, stream.Writable);
FakeStream.prototype._write = function (data, encoding, cb) {
expect(data.toString()).toEqual("hello world")
cb();
};
fakeRequest = function (url) {
var output = new stream.Readable();
output.push("hello world");
output.push(null);
expect(url).toEqual('http://hello');
return output;
};
fakeFs = {
createWriteStream: function (path) {
expect(path).toEqual('hello.txt');
return new FakeStream();
}
};
download.__set__('fs', fakeFs);
download.__set__('request', fakeRequest);
download('http://hello', 'hello.txt', function () {
done();
});
});
});
Has anyone come up with more elegant ways of testing streams?

Made streamtest for that purpose. It not only make streams tests cleaner but also allows to test V1 and V2 streams https://www.npmjs.com/package/streamtest

I've also been using memorystream, but then putting my assertions into the finish event. That way it looks more like a real use of the stream being tested:
require('chai').should();
var fs = require('fs');
var path = require('path');
var MemoryStream = require('memorystream');
var memStream = MemoryStream.createWriteStream();
/**
* This is the Transform that we want to test:
*/
var Parser = require('../lib/parser');
var parser = new Parser();
describe('Parser', function(){
it('something', function(done){
fs.createReadStream(path.join(__dirname, 'something.txt'))
.pipe(parser)
.pipe(memStream)
.on('finish', function() {
/**
* Check that our parser has created the right output:
*/
memStream
.toString()
.should.eql('something');
done();
});
});
});
Checking objects can be done like this:
var memStream = MemoryStream.createWriteStream(null, {objectMode: true});
.
.
.
.on('finish', function() {
memStream
.queue[0]
.should.eql({ some: 'thing' });
done();
});
.
.
.

Read the Stream into memory and compare it with the expected Buffer.
it('should output a valid Stream', (done) => {
const stream = getStreamToTest();
const expectedBuffer = Buffer.from(...);
let bytes = new Buffer('');
stream.on('data', (chunk) => {
bytes = Buffer.concat([bytes, chunk]);
});
stream.on('end', () => {
try {
expect(bytes).to.deep.equal(expectedBuffer);
done();
} catch (err) {
done(err);
}
});
});

I feel you pain.
I don't know any framework to help out testing with streams, but if take a look here,
where I'm developing a stream library, you can see how I approach this problem.
here is a idea of what I'm doing.
var chai = require("chai")
, sinon = require("sinon")
, chai.use(require("sinon-chai"))
, expect = chai.expect
, through2 = require('through2')
;
chai.config.showDiff = false
function spy (stream) {
var agent, fn
;
if (spy.free.length === 0) {
agent = sinon.spy();
} else {
agent = spy.free.pop();
agent.reset();
}
spy.used.push(agent);
fn = stream._transform;
stream.spy = agent;
stream._transform = function(c) {
agent(c);
return fn.apply(this, arguments);
};
stream._transform = transform;
return agent;
};
spy.free = [];
spy.used = [];
describe('basic through2 stream', function(){
beforeEach(function(){
this.streamA = through2()
this.StreamB = through2.obj()
// other kind of streams...
spy(this.streamA)
spy(this.StreamB)
})
afterEach(function(){
spy.used.map(function(agent){
spy.free.push(spy.used.pop())
})
})
it("must call transform with the data", function(){
var ctx = this
, dataA = new Buffer('some data')
, dataB = 'some data'
;
this.streamA.pipe(through2(function(chunk, enc, next){
expect(ctx.streamA.spy).to.have.been.calledOnce.and.calledWidth(dataA)
}))
this.streamB.pipe(through2(function(chunk, enc, next){
expect(ctx.streamB.spy).to.have.been.calledOnce.and.calledWidth(dataB)
}))
this.streamA.write(dataA)
this.streamB.write(dataB)
})
})
Note that my spy function wraps the _transform method and call my spy and call the original _transform
Also, The afterEach function is recycling the spies, because you can end up creating hundreds of them.
The problem gets hard is when you want to test async code. Then promises your best friend. The link I gave above have some sample that.

I haven't used this, and it's quite old, but https://github.com/dominictarr/stream-spec might help.

You can test streams using MemoryStream and sinon by using spies. Here is how I tested some of my code.
describe('some spec', function() {
it('some test', function(done) {
var outputStream = new MemoryStream();
var spyCB = sinon.spy();
outputStream.on('data', spyCB);
doSomething(param, param2, outputStream, function() {
sinon.assert.calledWith(spyCB, 'blah');
done();
});
});
});

Best way I have found is to use events
const byline = require('byline');
const fs = require('fs');
it('should process all lines in file', function(done){
//arrange
let lines = 0;
//file with 1000 lines
let reader = fs.readFileStream('./input.txt');
let writer = fs.writeFileStream('./output.txt');
//act
reader.pipe(byline).pipe(writer);
byline.on('line', function() {
lines++;
});
//assert
writer.on('close', function() {
expect(lines).to.equal(1000);
done();
});
});
by passing done as a callback, mocha waits until it is called before moving on.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node.js: How to read a stream into a buffer? - node.js

You can easily do this using node-fetch if you are pulling from http(s) URIs. From the readme: fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png') .then(res => res.buffer()) .then(buffer => console.log)

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Related

Nodejs: What's the proper way to pipe to a buffer / string [duplicate]

How to disconnect a socket after streaming data?

Node Async ReadStream from SFTP connection

Nodejs: How to send a readable stream to the browser

TDD/ testing with streams in NodeJS

Categories

Resources