NodeJS - Read Buffer line by line synchronously => toString() failed - node.js

I have been struggeling, and searching for a long time. I know there are answers about that, but none works.
I used fs.createReadStream and readLine for this. But It's using fs.close() to close FILE READING. so it doesnt work at all when used on a buffer. The reading on all files goes on without possiblity to interrupt it...
Then I used this :
const stream = require('stream');
let bufferStream = new stream.PassThrough();
bufferStream.end(hexaviaFile.buffer);
bufferStream
.pipe(require('split')())
.pipe(es.mapSync(function(line){
// pause the readstream
bufferStream.pause();
// DO WHATEVER WITH YOUR LINE
console.log('line content = ' + line);
// resume the readstream, possibly from a callback
bufferStream.resume();
}).on('error', function(err){
console.log('Error while reading file.' + err);
}).on('end', function(){
console.log('end event !');
}).on('close', function(){
console.log('close event !');
})
);
// toString() Failed
I get the [toString() Failed] error and searched about it, apparently it appears when the buffer is large than node buffer max size.
So I checked :
var buffer = require('buffer');
console.log('buffer.kMaxLength = ', buffer.kMaxLength); // 2147483647
console.log('hexaviaFile.buffer.byteLength = ', hexaviaFile.buffer.byteLength); // => 413567671
It's not the case as you can see numbers provided:
* maxBuffer size = 2Go
* my buffer = 0.4Go
I also tried some diffeent library to do so but:
1. I want to keep memory usage as low as possible
2. I need this reading to be perfectly SYNC. In other words, I have some processings after the file reading and I need to complete all the reading before going to next steps.
I don't know what to do :) Any kind (of) help appreciated
Regards.

I forgot about this post. I found a way to achieve this without errors.
It's given here : https://github.com/request/request/issues/2826
1st create a splitter to read string chunks
class Splitter extends Transform {
constructor(options){
super(options);
this.splitSize = options.splitSize;
this.buffer = Buffer.alloc(0);
this.continueThis = true;
}
stopIt() {
this.continueThis = false;
}
_transform(chunk, encoding, cb){
this.buffer = Buffer.concat([this.buffer, chunk]);
while ((this.buffer.length > this.splitSize || this.buffer.length === 1) && this.continueThis){
try {
let chunk = this.buffer.slice(0, this.splitSize);
this.push(chunk);
this.buffer = this.buffer.slice(this.splitSize);
if (this.buffer[0] === 26){
console.log('EOF : ' + this.buffer[0]);
}
} catch (err) {
console.log('ERR OCCURED => ', err);
break;
}
}
console.log('WHILE FINISHED');
cb();
}
}
Then pipe it to your stream :
let bufferStream = new stream.PassThrough();
bufferStream.end(hugeBuffer);
let splitter = new Splitter({splitSize : 170}); // In my case I have 170 length lines, so I want to process them line by line
let lineNr = 0;
bufferStream
.pipe(splitter)
.on('data', async function(line){
line = line.toString().trim();
splitter.pause(); // pause stream so you can perform long time processing with await
lineNr++;
if (lineNr === 1){
// DO stuff with 1st line
} else {
splitter.stopIt(); // Break the stream and stop reading so we just read 1st line
}
splitter.resume() // resumestream so you can process next chunk
}).on('error', function(err){
console.log('Error while reading file.' + err);
// whatever
}).on('end', async function(){
console.log('end event');
// Stream has ended, do whatever...
});

Related

Execute Loop after ReadStream pipe has finished

Not sure if the Title is quite right in this as I'm pretty much stumped (in over my head)...
I'm trying to pull the headers from a csv, as part of an automation test, to the validate those headers. I'm using csv-parse to read the csv file.
Once I've gathered the headers I'm then doing a simple assertion to go through and assert against each one. Using the string values I've entered into my test script.
However currently, the FOR is executing before the csv read and headers have been gathered. I'm not sure how to wait for this to finish before executing the loop.
const fs = require('fs');
const csv = require('csv-parser');
let headerArray = null;
const headerValues = values.split(',');
browser.pause(10000);
fs.createReadStream(""+global.downloadDir + "\\" + fs.readdirSync(global.downloadDir)[0])
.pipe(csv())
.on('headers', function (headers) {
return headerArray = headers
})
for(var i =0; i<headerValues.length; i++){
assert.equal(headerValues[i], headerArray[i]);
}
The solution is to run the for loop with your assertions inside the 'headers' event handler, eg:
var results = [] // we'll collect the rows from the csv into this array
var rs = fs.createReadStream(""+global.downloadDir + "\\" + fs.readdirSync(global.downloadDir)[0])
.pipe(csv())
.on('headers', function (headers) {
try {
for(var i =0; i<headerValues.length; i++){
assert.equal(headerValues[i], headers[i]);
}
} catch (err) {
// an assertion failed so let's end
// the stream (triggering the 'error' event)
rs.destroy(err)
}
}).on('data', function(data) {
results.push(data)
}).on('end', function() {
//
// the 'end' event will fire when the csv has finished parsing.
// so you can do something useful with the `results` array here...
//
}).on('error', function(err) {
// otherwise, the 'error' event will have likely fired.
console.log('something went wrong:', err)
})

Can't get NodeJS to write anything to a writable stream

Running NodeJS (v8.16.2) locally on the command line. I scrape an e-commerce website, gather the relevant information into a data-structure, then try to write it into a plain-text CSV file (my records don't have a fixed set of fields) manually by creating a write stream. This last step isn't working.
// Other stuff
const exitHandler = function(options, exitCode) {
if (exitCode || exitCode !== 0) console.log(exitCode);
// Other stuff
writeToCsv();
if (options.exit) process.exit();
}
const writeToCsv = function() {
let ws = fs.createWriteStream('./final-data.csv');
const crlf = '\n\r'; // might need to reverse this
// Please ignore the weird layout
for (let seller in finalData.sellers) {
ws.write('Seller:,' + seller + crlf + ',Brands:');
for (let brand of finalData.sellers[seller].brands) {
ws.write(',' + brand);
}
ws.write(crlf + ',Addresses:');
for (let addr of finalData.sellers[seller].addrs) {
ws.write(',"' + addr + '"');
}
ws.write(crlf);
}
ws.on('finish', () => {
console.log('Wrote all data'); // never prints this
});
ws.end();
}
process.on('exit', exitHandler.bind(null,{cleanup:true}));
I suspect this is because NodeJS exits before the data has been flushed to disk, but can't figure out a way to make NodeJS flush the data synchronously.
PS: new to NodeJS
please check out below example and integrate it
as per your comment i updated code
async function writeDataInCSV(filePath, dynamicHeader, data) {
const csvWriter = createCsvWriter({
path: filePath,
header: dynamicHeader
});
await csvWriter
.writeRecords(data)
.then(()=> console.log('The CSV file was written successfully'));
}
writeDataInCSV('out.csv',dynamicHeader, Data)
here set array of header and make data and pass to writeDataInCSV method

Node Async ReadStream from SFTP connection

So I'm creating a class and ultimately want to create a method that takes a file on an SFTP server and produces a readstream that can be piped into other streams / functions. I'm most of the way there, except my readStream is acting strangely. Here's the relevant code:
const Client = require('ssh2').Client,
Readable = require('stream').Readable,
async = require('async');
/**
* Class Definition stuff
* ......
*/
getStream(get) {
const self = this;
const rs = new Readable;
rs._read = function() {
const read = this;
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
let bufferSize = stats.size,
chunkSize = 512,//bytes
buffer = new Buffer(bufferSize),
bytesRead = 0;
async.whilst(
function () {
return bytesRead < bufferSize;
},
function (done) {
sftp.read(fd, buffer, bytesRead, chunkSize, bytesRead,
function (err, bytes, buff) {
if (err) return done(err);
// console.log(buff.toString('utf8'));
read.push(buff);
bytesRead += bytes;
done();
});
},
function (err) {
if (err) console.log(err);
read.push(null);
sftp.close(fd);
}
);
});
});
});
}).connect(self.connectionObj);
}
return rs;
}
Elsewhere, I would call this method like so:
let sftp = new SFTP(credentials);
sftp.getStream('/path/file.csv')
.pipe(toStuff);
.pipe(toOutput);
So, long story short. During the SFTP.read operation read.push(buff) keeps pushing the same first part of the file over and over. However, when I console.log(buff) it correctly streams the full file?
So I'm scratching my head wondering what I'm doing wrong with the read stream that it's only pushing the beginning of the file and not continuing on to the next part of the buffer.
Here's the docs on SSH2 SFTP client: https://github.com/mscdex/ssh2-streams/blob/master/SFTPStream.md
I used this SO question as inspiration for what I wrote above: node.js fs.read() example
This is similar/related: Reading file from SFTP server using Node.js and SSH2
Ok, after a lot of trouble, I realized I was making a couple mistakes. First, the _read function is called every time the stream is ready to read more data, which means, the SFTP connection was being started everytime _read was called. This also meant the sftp.read() function was starting over each time, reseting the starting point back to the beginning.
I needed a way to first setup the connection, then read and stream the file data, so I chose the library noms. Here's the final code if anyone is interested:
getStream (get) {
const self = this;
let connection,
fileData,
buffer,
totalBytes = 0,
bytesRead = 0;
return nom(
// _read function
function(size, next) {
const read = this;
// Check if we're done reading
if(bytesRead === totalBytes) {
connection.close(fileData);
connection.end();
self.conn.end();
console.log('done');
return read.push(null);
}
// Make sure we read the last bit of the file
if ((bytesRead + size) > totalBytes) {
size = (totalBytes - bytesRead);
}
// Read each chunk of the file
connection.read(fileData, buffer, bytesRead, size, bytesRead,
function (err, byteCount, buff, pos) {
// console.log(buff.toString('utf8'));
// console.log('reading');
bytesRead += byteCount;
read.push(buff);
next();
}
);
},
// Before Function
function(start) {
// setup the connection BEFORE we start _read
self.conn.on('ready', function(){
self.conn.sftp(function(err,sftp) {
if(err) return err;
sftp.open(get, 'r', function(err, fd){
sftp.fstat(fd, function(err, stats) {
connection = sftp;
fileData = fd;
totalBytes = stats.size;
buffer = new Buffer(totalBytes);
console.log('made connection');
start();
});
});
});
}).connect(self.connectionObj);
})
}
Always looking for feedback. This doesn't run quite as fast as I'd hope, so let me know if you have ideas on speeding up the stream.

createWriteStream 'close' event not being triggered

I am trying to extract images from a csv file by doing the following:
Parsing/streaming in a large csv file using csv-parse and the fs createReadStream method
Grabbing each line for processing using stream-transform
Extraction of image and other row data for processing using the async waterfall method.
Download and write image to server using request and the fs createWriteStream method
For some reason after the data gets piped into createWriteStream, there is some event in which an async callback never gets called. I have run this same code only using request, without piping to createWriteStream, and it works. I've also run createWriteStream w/ a drain event, and then some how it works? Can anyone explain this to me?
In the code below, request is trying to pipe 14,970 images, but the createWriteStream close or finish events only fire 14,895 times, with error firing 0 times. Could this be a draining issue? Could highWaterMark be exceeded and a write fail could be occurring undetected?
Here is my csv line getting code:
var first = true;
var parser = parse();
var transformer = transform( (line, complete) => {
if(!first)
extractData(line,complete)
else {
first = false;
complete(null);
}
},
() => {
console.log('Done: parseFile');
});
fs.createReadStream(this.upload.location).pipe(parser).pipe(transformer);
extractData function that doesn't always do a required async callback:
extractData(line,complete){
var now = new Date();
var image = {
createdAt: now,
updatedAt: now
};
async.waterfall([
next => { // Data Extraction
async.forEachOf(line, (data, i, complete) => {
if(i === 2) image.src = data;
if(i === 3) image.importSrc = data;
complete(null);
}, err => {
if(err) throw err;
next(null);
});
},
next => { // Download Image
var file = fs.createWriteStream('public/'+image.src);
var sendReq = request.get(image.importSrc);
sendReq.on('response', response => {
if (response.statusCode !== 200) {
this.upload.report.image.errors++;
return next(null);
}
});
sendReq.on('error', err => {
this.upload.report.image.errors++;
next(null);
});
sendReq.pipe(file);
file.on('finish', () => {
this.upload.report.image.inserts++;
file.close(next); // Close file and callback
});
file.on('error', err => {
this.upload.report.image.errors++;
next(null);
});
}
], err => {
if(err) throw err;
complete(null);
});
}
As suggested by #mscdex, I've also tried switching out finish for his replacement close approach.
file.close(next); is unnecessary as the file stream is closed automatically by default. What you can do instead is to listen for the close event to know when the file descriptor for the stream has been closed. So replace the entire finish event handler with:
file.on('close', () => {
this.upload.report.image.inserts++;
next(null);
});

Node.js: How to read a stream into a buffer?

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)
also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)
var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');
module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }
client.get(imageUrl, function(res) {
if (res.statusCode != 200) {
return callback(new Error('HTTP Response code ' + res.statusCode));
}
gm(res)
.geometry(1024, 768, '>')
.stream('jpg', function(err, stdout, stderr) {
if (!err) {
var buf = new Buffer(0);
stdout.on('data', function(d) {
buf = Buffer.concat([buf, d]);
});
stdout.on('end', function() {
var headers = {
'Content-Length': buf.length
, 'Content-Type': 'Image/jpeg'
, 'x-amz-acl': 'public-read'
};
s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
if(err) {
return callback(err);
} else {
return callback(null, res.client._httpMessage.url);
}
});
});
} else {
callback(err);
}
});
}).on('error', function(err) {
callback(err);
});
};
Overall I don't see anything that would break in your code.
Two suggestions:
The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
})
For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.
As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.
Update
Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.
With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.
var pause_stream = require('pause-stream');
// Your other code.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var contentLength = bufs.reduce(function(sum, buf){
return sum + buf.length;
}, 0);
// Create a stream that will emit your chunks when resumed.
var stream = pause_stream();
stream.pause();
while (bufs.length) stream.write(bufs.shift());
stream.end();
var headers = {
'Content-Length': contentLength,
// ...
};
s3.putStream(stream, ....);
Javascript snippet
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Typescript snippet
async function stream2buffer(stream: Stream): Promise<Buffer> {
return new Promise < Buffer > ((resolve, reject) => {
const _buf = Array < any > ();
stream.on("data", chunk => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", err => reject(`error converting stream - ${err}`));
});
}
You can easily do this using node-fetch if you are pulling from http(s) URIs.
From the readme:
fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
.then(res => res.buffer())
.then(buffer => console.log)
Note: this solely answers "How to read a stream into a buffer?" and ignores the context of the original question.
ES2018 Answer
Since Node 11.14.0, readable streams support async iterators.
const buffers = [];
// node.js readable streams implement the async iterator protocol
for await (const data of readableStream) {
buffers.push(data);
}
const finalBuffer = Buffer.concat(buffers);
Bonus: In the future, this could get better with the stage 2 Array.fromAsync proposal.
// 🛑 DOES NOT WORK (yet!)
const finalBuffer = Buffer.concat(await Array.fromAsync(readableStream));
You can convert your readable stream to a buffer and integrate it in your code in an asynchronous way like this.
async streamToBuffer (stream) {
return new Promise((resolve, reject) => {
const data = [];
stream.on('data', (chunk) => {
data.push(chunk);
});
stream.on('end', () => {
resolve(Buffer.concat(data))
})
stream.on('error', (err) => {
reject(err)
})
})
}
the usage would be as simple as:
// usage
const myStream // your stream
const buffer = await streamToBuffer(myStream) // this is a buffer
I suggest loganfsmyths method, using an array to hold the data.
var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
var buf = Buffer.concat(bufs);
}
IN my current working example, i am working with GRIDfs and npm's Jimp.
var bucket = new GridFSBucket(getDBReference(), { bucketName: 'images' } );
var dwnldStream = bucket.openDownloadStream(info[0]._id);// original size
dwnldStream.on('data', function(chunk) {
data.push(chunk);
});
dwnldStream.on('end', function() {
var buff =Buffer.concat(data);
console.log("buffer: ", buff);
jimp.read(buff)
.then(image => {
console.log("read the image!");
IMAGE_SIZES.forEach( (size)=>{
resize(image,size);
});
});
I did some other research
with a string method but that did not work, per haps because i was reading from an image file, but the array method did work.
const DISCLAIMER = "DONT DO THIS";
var data = "";
stdout.on('data', function(d){
bufs+=d;
});
stdout.on('end', function(){
var buf = Buffer.from(bufs);
//// do work with the buffer here
});
When i did the string method i got this error from npm jimp
buffer: <Buffer 00 00 00 00 00>
{ Error: Could not find MIME for Buffer <null>
basically i think the type coersion from binary to string didnt work so well.
I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers
I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,
var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');
var _streamFile = function(res , stream , cb){
var cache = new StreamCache();
var lstream = lengthStream(function(length) {
res.header("Content-Length", length);
cache.pipe(res);
});
stream.on('error', function(err){
return cb(err);
});
stream.on('end', function(){
return cb(null , true);
});
return stream.pipe(lstream).pipe(cache);
}
in ts, [].push(bufferPart) is not compatible;
so:
getBufferFromStream(stream: Part | null): Promise<Buffer> {
if (!stream) {
throw 'FILE_STREAM_EMPTY';
}
return new Promise(
(r, j) => {
let buffer = Buffer.from([]);
stream.on('data', buf => {
buffer = Buffer.concat([buffer, buf]);
});
stream.on('end', () => r(buffer));
stream.on('error', j);
}
);
}
You can do this by:
async function toBuffer(stream: ReadableStream<Uint8Array>) {
const list = []
const reader = stream.getReader()
while (true) {
const { value, done } = await reader.read()
if (value)
list.push(value)
if (done)
break
}
return Buffer.concat(list)
}
or using buffer consumer
const buf = buffer(stream)
You can check the "content-length" header at res.headers. It will give you the length of the content you will receive (how many bytes of data it will send)

Resources