Node Async ReadStream from SFTP connection

Node Async ReadStream from SFTP connection - node.js

So I'm creating a class and ultimately want to create a method that takes a file on an SFTP server and produces a readstream that can be piped into other streams / functions. I'm most of the way there, except my readStream is acting strangely. Here's the relevant code:
const Client = require('ssh2').Client,
Readable = require('stream').Readable,
async = require('async');
/**
* Class Definition stuff
* ......
*/
getStream(get) {
const self = this;
const rs = new Readable;
rs._read = function() {
const read = this;
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
let bufferSize = stats.size,
chunkSize = 512,//bytes
buffer = new Buffer(bufferSize),
bytesRead = 0;
async.whilst(
function () {
return bytesRead < bufferSize;
},
function (done) {
sftp.read(fd, buffer, bytesRead, chunkSize, bytesRead,
function (err, bytes, buff) {
if (err) return done(err);
// console.log(buff.toString('utf8'));
read.push(buff);
bytesRead += bytes;
done();
});
},
function (err) {
if (err) console.log(err);
read.push(null);
sftp.close(fd);
}
);
});
});
});
}).connect(self.connectionObj);
}
return rs;
}
Elsewhere, I would call this method like so:
let sftp = new SFTP(credentials);
sftp.getStream('/path/file.csv')
.pipe(toStuff);
.pipe(toOutput);
So, long story short. During the SFTP.read operation read.push(buff) keeps pushing the same first part of the file over and over. However, when I console.log(buff) it correctly streams the full file?
So I'm scratching my head wondering what I'm doing wrong with the read stream that it's only pushing the beginning of the file and not continuing on to the next part of the buffer.
Here's the docs on SSH2 SFTP client: https://github.com/mscdex/ssh2-streams/blob/master/SFTPStream.md
I used this SO question as inspiration for what I wrote above: node.js fs.read() example
This is similar/related: Reading file from SFTP server using Node.js and SSH2

Ok, after a lot of trouble, I realized I was making a couple mistakes. First, the _read function is called every time the stream is ready to read more data, which means, the SFTP connection was being started everytime _read was called. This also meant the sftp.read() function was starting over each time, reseting the starting point back to the beginning.
I needed a way to first setup the connection, then read and stream the file data, so I chose the library noms. Here's the final code if anyone is interested:
getStream (get) {
const self = this;
let connection,
fileData,
buffer,
totalBytes = 0,
bytesRead = 0;
return nom(
// _read function
function(size, next) {
const read = this;
// Check if we're done reading
if(bytesRead === totalBytes) {
connection.close(fileData);
connection.end();
self.conn.end();
console.log('done');
return read.push(null);
}
// Make sure we read the last bit of the file
if ((bytesRead + size) > totalBytes) {
size = (totalBytes - bytesRead);
}
// Read each chunk of the file
connection.read(fileData, buffer, bytesRead, size, bytesRead,
function (err, byteCount, buff, pos) {
// console.log(buff.toString('utf8'));
// console.log('reading');
bytesRead += byteCount;
read.push(buff);
next();
}
);
},
// Before Function
function(start) {
// setup the connection BEFORE we start _read
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
connection = sftp;
fileData = fd;
totalBytes = stats.size;
buffer = new Buffer(totalBytes);
console.log('made connection');
start();
});
});
});
}).connect(self.connectionObj);
})
}
Always looking for feedback. This doesn't run quite as fast as I'd hope, so let me know if you have ideas on speeding up the stream.

Related

NodeJS - Read Buffer line by line synchronously => toString() failed

I have been struggeling, and searching for a long time. I know there are answers about that, but none works.
I used fs.createReadStream and readLine for this. But It's using fs.close() to close FILE READING. so it doesnt work at all when used on a buffer. The reading on all files goes on without possiblity to interrupt it...
Then I used this :
const stream = require('stream');
let bufferStream = new stream.PassThrough();
bufferStream.end(hexaviaFile.buffer);
bufferStream
.pipe(require('split')())
.pipe(es.mapSync(function(line){
// pause the readstream
bufferStream.pause();
// DO WHATEVER WITH YOUR LINE
console.log('line content = ' + line);
// resume the readstream, possibly from a callback
bufferStream.resume();
}).on('error', function(err){
console.log('Error while reading file.' + err);
}).on('end', function(){
console.log('end event !');
}).on('close', function(){
console.log('close event !');
})
);
// toString() Failed
I get the [toString() Failed] error and searched about it, apparently it appears when the buffer is large than node buffer max size.
So I checked :
var buffer = require('buffer');
console.log('buffer.kMaxLength = ', buffer.kMaxLength); // 2147483647
console.log('hexaviaFile.buffer.byteLength = ', hexaviaFile.buffer.byteLength); // => 413567671
It's not the case as you can see numbers provided:
* maxBuffer size = 2Go
* my buffer = 0.4Go
I also tried some diffeent library to do so but:
1. I want to keep memory usage as low as possible
2. I need this reading to be perfectly SYNC. In other words, I have some processings after the file reading and I need to complete all the reading before going to next steps.
I don't know what to do :) Any kind (of) help appreciated
Regards.

I forgot about this post. I found a way to achieve this without errors.
It's given here : https://github.com/request/request/issues/2826
1st create a splitter to read string chunks
class Splitter extends Transform {
constructor(options){
super(options);
this.splitSize = options.splitSize;
this.buffer = Buffer.alloc(0);
this.continueThis = true;
}
stopIt() {
this.continueThis = false;
}
_transform(chunk, encoding, cb){
this.buffer = Buffer.concat([this.buffer, chunk]);
while ((this.buffer.length > this.splitSize || this.buffer.length === 1) && this.continueThis){
try {
let chunk = this.buffer.slice(0, this.splitSize);
this.push(chunk);
this.buffer = this.buffer.slice(this.splitSize);
if (this.buffer[0] === 26){
console.log('EOF : ' + this.buffer[0]);
}
} catch (err) {
console.log('ERR OCCURED => ', err);
break;
}
}
console.log('WHILE FINISHED');
cb();
}
}
Then pipe it to your stream :
let bufferStream = new stream.PassThrough();
bufferStream.end(hugeBuffer);
let splitter = new Splitter({splitSize : 170}); // In my case I have 170 length lines, so I want to process them line by line
let lineNr = 0;
bufferStream
.pipe(splitter)
.on('data', async function(line){
line = line.toString().trim();
splitter.pause(); // pause stream so you can perform long time processing with await
lineNr++;
if (lineNr === 1){
// DO stuff with 1st line
} else {
splitter.stopIt(); // Break the stream and stop reading so we just read 1st line
}
splitter.resume() // resumestream so you can process next chunk
}).on('error', function(err){
console.log('Error while reading file.' + err);
// whatever
}).on('end', async function(){
console.log('end event');
// Stream has ended, do whatever...
});

NodeJS - How can I stream a response using an in memory DB?

How can I stream a response using an in memory DB?
I'm using Loki JS as an in memory DB. There is a particular resource where I must return the entire contents of a table (cannot be paginated) and that table can grow to 500,000 items or so, which is about 300mb.
In other cases, I have used fs.createReadStream to get a file and stream it back to the user:
fs.createReadStream('zips.json')
.on('data', function() {
res.write(...)
})
.on('end', function() {
res.end();
})
This has worked great for large files, but how can I do something equivalent using an in memory DB?
const items = lokiDb.addCollection('items');
items.insert('a bunch of items ...');
// I would now like to stream items via res.write
res.write(items)
Currently, res.write(items) will cause memory problems as Node is trying to return the entire response at once.

As far as I can tell, there is no native stream provider in Loki, though I may have missed it. What you may want to do instead is listen to the 'insert' event on the collection and write that, like so:
const items = lokiDb.addCollection('items');
items.on('insert', (results) => {
res.write(results);
});
items.insert('a bunch of items ...');

If I'm correct, basically your problem is that readStreams only read from files, and that you want to read from an in-memory data structure. A solution might be to define your own readStream class, slightly modifying the prototype stream.Readable._read method:
var util = require('util');
var stream = require('stream');
"use strict";
var begin=0, end=0;
var options = {
highWaterMark: 16384,
encoding: null,
objectMode: false
};
util.inherits(InMemoryStream, stream.Readable);
function InMemoryStream(userDefinedOptions, resource){
if (userDefinedOptions){
for (var key in userDefinedOptions){
options.key = userDefinedOptions[key];
}
}
this.resource = resource;
stream.Readable.call(this, options);
}
InMemoryStream.prototype._read = function(size){
end += size;
this.push(this.resource.slice(begin, end));
begin += size;
}
exports.InMemoryStream = InMemoryStream;
exports.readStream = function(UserDefinedOptions, resource){
return new InMemoryStream(UserDefinedOptions, resource);
}
You convert your in-memory datastructure (in the following example an array) to a readStream, and pipe this through to a writeStream, as follows:
"use strict";
var fs = require('fs');
var InMemoryStream = require('/home/regular/javascript/poc/inmemorystream.js');
var stored=[], writestream, config={};
config = {
encoding: null,
fileToRead: 'raphael.js',
fileToWrite: 'secondraphael.js'
}
fs.readFile(config.fileToRead, function(err, data){
if (err) return console.log('Error when opening file', err);
stored = data;
var inMemoryStream = InMemoryStream.readStream({encoding: config.encoding}, stored);
writestream = fs.createWriteStream(config.fileToWrite);
inMemoryStream.pipe(writestream);
inMemoryStream.on('error', function(err){
console.log('in memory stream error', err);
});
});

fs.read have a different behaviour

Any idea why fs.read cannot behave as fs.readSync?
My code is very simple, just read out the songs file chunk by chunk. And I find out with fs.readSync function that the song file can read out 512 bytes everytime while with fs.read function, there is no log info printed out and if i delete the while(readPosition < fileSize), it executes only one time.
var chunkSize = 512; //the chunk size that will be read every time
var readPostion = 0; //the first byte which will be read from the file.
var fileSize =0;
var fs=require('fs');
//var Buffer = require("buffer");
//var songsBuf = Buffer.alloc(512);
var songsBuf = new Buffer(chunkSize);
fs.open('/media/sdcard/song.mp3','r',function(err,fd){
if(err)
throw err;
console.log("The file had been opened");
var fileSize = fs.fstatSync(fd).size;
console.log("The total size of this file is:%d Bytes",fileSize);
console.log("Start to read the file chunk by chunk");
//read the file in sync mode
while(readPostion<fileSize)
{
fs.readSync(fd,songsBuf,0,chunkSize,readPostion);
if(readPostion+chunkSize>fileSize)
chunkSize = fileSize-readPostion;
readPostion+=chunkSize;
console.log("the read position is %d",readPostion);
console.log("The chunk size is %d",chunkSize);
console.log(songsBuf);
}
//the code above can readout the file chunk by chunk but the below one cannot
//read the file in Async mode.
while(readPostion<fileSize)
{
// console.log("ff");
fs.read(fd,songsBuf,0,chunkSize,1,function(err,byteNum,buffer){
if(err)
throw err;
console.log("Start to read from %d byte",readPostion);
console.log("Total bytes are %d",byteNum);
console.log(buffer);
if(readPostion+chunkSize>fileSize)
chunkSize = fileSize-readPostion; //if the files to read is smaller than one chunk
readPostion+=chunkSize;
});
}
fs.close(fd);
});

You can do this with the async library.
async.whilst(
function () { return readPostion < fileSize },
function (callback) {
fs.read(fd, songsBuf, 0, chunkSize, 1, function (err, byteNum, buffer) {
if (err) return callback(err)
console.log("Start to read from %d byte",readPostion);
console.log("Total bytes are %d",byteNum);
console.log(buffer);
if(readPostion + chunkSize > fileSize)
chunkSize = fileSize - readPostion; //if the files to read is smaller than one chunk
readPostion += chunkSize
callback(null, songBuffer)
})
},
function (err, n) {
if (err) console.error(err)
fs.close(fd)
// Do something with songBuffer here
}
)

Read a file one character at a time in node.js?

Is there a way to read one symbol at a time in nodejs from file without storing the whole file in memory?
I found an answer for lines
I tried something like this but it doesn't help:
const stream = fs.createReadStream("walmart.dump", {
encoding: 'utf8',
fd: null,
bufferSize: 1,
});
stream.on('data', function(sym){
console.log(sym);
});

Readable stream has a read() method, where you can pass the length, in bytes, of every chunk to be read. For example:
var readable = fs.createReadStream("walmart.dump", {
encoding: 'utf8',
fd: null,
});
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read(1) /* here */)) {
console.log(chunk); // chunk is one byte
}
});

Here's a lower-level way to do it: fs.read(fd, buffer, offset, length, position, callback)
using:
const fs = require('fs');
// open file for reading, returns file descriptor
const fd = fs.openSync('your-file.txt','r');
function readOneCharFromFile(position, cb){
// only need to store one byte (one character)
const b = new Buffer(1);
fs.read(fd, b, 0, 1, position, function(err,bytesRead, buffer){
console.log('data => ', String(buffer));
cb(err,buffer);
});
}
you will have to increment the position, as you read the file, but it will work.
here's a quick example of how to read a whole file, character by character
Just for fun I wrote this complete script to do it, just pass in a different file path, and it should work
const async = require('async');
const fs = require('fs');
const path = require('path');
function read(fd, position, cb) {
let isByteRead = null;
let ret = new Buffer(0);
async.whilst(
function () {
return isByteRead !== false;
},
function (cb) {
readOneCharFromFile(fd, position++, function (err, bytesRead, buffer) {
if(err){
return cb(err);
}
isByteRead = !!bytesRead;
if(isByteRead){
ret = Buffer.concat([ret,buffer]);
}
cb(null);
});
},
function (err) {
cb(err, ret);
}
);
}
function readOneCharFromFile(fd, position, cb) {
// only need to store one byte (one character)
const b = new Buffer(1);
fs.read(fd, b, 0, 1, position, cb);
}
/// use your own file here
const file = path.resolve(__dirname + '/fixtures/abc.txt');
const fd = fs.openSync(file, 'r');
// start reading at position 0, position will be incremented
read(fd, 0, function (err, data) {
if (err) {
console.error(err.stack || err);
}
else {
console.log('data => ', String(data));
}
fs.closeSync(fd);
});
As you can see we increment the position integer every time we read the file. Hopefully the OS keeps the file in memory as we go. Using async.whilst() is OK, but I think for a more functional style it's better not to keep the state in the top of the function (ret and isByteRead). I will leave it as an exercise to the reader to implement this without using those stateful variables.

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};

Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);

Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}

You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)

Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));

You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer

I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}

in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}

You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)

You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node Async ReadStream from SFTP connection - node.js

Related

NodeJS - Read Buffer line by line synchronously => toString() failed

NodeJS - How can I stream a response using an in memory DB?

fs.read have a different behaviour

Read a file one character at a time in node.js?

Node.js: How to read a stream into a buffer?

Categories

Resources