Reading larger sftp files with ssh2 sftp Node - node.js

I'm trying to use the ssh2-sftp library to read/write a file in Node. When I do an sftp.get on a larger CSV file (but not too large -- like 2 MB only) on an sftp site, and then read data on the returned stream, the call is hanging on me after the 14th stream.on("data") call. I've tested this with a few different sample files and the code works fine on smaller files. But if a CSV file is big enough to get past that 14th call, it just hangs and it's like it can't read anymore even though there's more there to read. And the stream.on("close") never gets called into either.
Obviously this is pretty weird behavior. Was hoping maybe somebody has run into something similar using this library and had some guidance.
If it helps at all, here is some code
sftp.get(currentFileName).then((readStream) => {
var counter = 0;
readStream.on("data", function(d) {
counter++;
console.log("counter = " + counter);
});
readStream.on("close", function(err) {
if (err) {
console.error("Problem with read stream for file " + currentFileName + ", error = ", err);
}
//done reading the individual file, all went well
else {
console.log("done reading the stream");
}
});
readStream.on('error', function(e){
console.error("Error retrieving file: " + e);
})
readStream.resume();
And after the 14th call into readStream.on("data"), it just freezes up. With maybe half the file read.

The issue turned out to be that ssh2-sftp seems to be running an outdated version of the underlying ssh2 library. Switching from ssh2-sftp to the most recent (0.5.2) version of ssh2 and using that library directly fixed the issue (which might have been this one: https://github.com/mscdex/ssh2/issues/450)

Related

Incorect header check error from Zlib.gunzip() in Node.JS app using the HTTPS module

I have a Node.JS app and I am using the https module to make GET requests to a web server. The headers in the response coming back have the content-type set to gzip. I have directly inspected the data and it does appear to be compressed data and definitely not plain text.
I accumulate the chunks as they come in. I then try decompressing the accumulated data using zlib. So far everything I tried results in an "Incorrect header check" error when execute the decompression call. The code below shows the use of a Buffer object with type set to binary. I previously tried passing the accumulated data directly to the decompression call but that failed too.
Why doesn't this work?
// Make the request to the designated external server.
const httpsRequest = https.request(postOptions,
function(extRequest)
{
console.log('(httpsRequest) In request handler.');
// Process the response from the external server.
let dataBody = "";
// The data may come to us in pieces. The 'on' event handler will accumulate them for us.
let iNumSlices = 0;
extRequest.on('data', function(dataSlice) {
iNumSlices++;
console.log('(httpsRequest:on) Received slice # ' + iNumSlices +'.');
dataBody += dataSlice;
});
// When we have received all the data from the external server, finish the request.
extRequest.on('end', function() {
// SUCCESS: Return the result to AWS.
console.log('(httpsRequest:end) Success. Data body length: ' + dataBody.length +'.');
console.log('(httpsRequest:end) Content: ');
let buffer = Buffer.from(dataBody, "binary");
// Check for GZip compressed data.
if (extRequest.headers['content-encoding'] == 'gzip') {
// Decompress the data.
zlib.gunzip(buffer, (err, buffer) => {
if (err) {
// Reject the promise with the error.
reject(err);
return;
} else {
console.log(errPrefix + buffer.toString('utf8'));
}
});
} else {
console.log(errPrefix + dataBody);
let parsedDataBodyObj = JSON.parse(dataBody);
resolve(parsedDataBodyObj);
}
});
});
You may have it in you actual code - but the code snippet doesn't include a call to end(), which is mandatory.
It may be related to the way you accumulate the chunks with dataBody += dataSlice.
Since the data is compressed, this (probably) means that the type of a chunk is already a Buffer, and using += to concatenate it into a string seems to mess it up, even though you later call Buffer.from.
Try replacing it with making dataBody an empty array, then push chunks into it, then finally call Buffer.concat(dataBody).
Another options is that https.request already decompresses the data under the hood, so that once you accumulate the chunks into a buffer (as detailed in the previous section), all you're left with is to call buffer.toString(). I myself experienced it in this other answer and it seems to be related to Node.js version.
I'll end up this answer with a live demo of a similar working code which may come handy for you (it queries StackExchange API, gets a gzip compressed chunks, and then decompress it):
It includes a code that works on 14.16.0 (current StackBlitz version) - which, as I described, already decompresses the data under the hood - but not on Node.js 15.13.0,
It includes a commented-out code that works for Node.js 15.13.0 the latter but not for 14.16.0.

Node.js read/write stream skipping the first line on write

I wrote a simple utility to convert a somewhat weird json file (multiple objects not in an array) to csv for some system testing purposes. The read and transformation themselves are fine, and the resulting string is logged to the console correctly, but sometimes the resulting csv file is missing the first data line (it shows header, 1 blank line, then rest of data). I'm using read and write streams, without any provisions for backpressure. I don't think the problem is backpressure, since only the 1st line gets skipped, but I could be wrong. Any ideas?
const fs = require('fs');
const readline = require('readline');
const JSONbig = require('json-bigint');
// Create read interface to stream each line
const readInterface = readline.createInterface({
input: fs.createReadStream('./confirm.json'),
// output: process.stdout,
console: false
});
const writeHeader = fs.createWriteStream('./confirm.csv');
const header = "ACTION_TYPE,PROCESS_PICK,TYPE_FLAG,APP_ID,FACILITY_ID,CONTAINER_ID,USER_ID,CONFIRM_DATE_TS,PICK_QTY,REMAINING_QTY,PICK_STATUS,ASSIGNMENT_ID,LOCATION_ID,ITEM_ID,CLUSTER_ID,TOTAL_QTY,TOTAL_ITEMS,WAVE_NBR,QA_FLAG,WORK_DIRECTIVE_ID\n";
writeHeader.write(header);
// Create write interface to save each csv line
const writeDetail = fs.createWriteStream('./confirm.csv', {
flags: 'a'
});
readInterface.on('line', function(line) {
let task = JSONbig.parse(line);
task.businessData.MESSAGE.RECORD[0].DETAIL.REG_DETAIL.forEach(element => {
let csv = "I,PTB,0,VCO,PR9999999011,,cpicker1,2020121000000," + element.QUANTITYTOPICK.toString() + ",0,COMPLETED," +
task.businessData.MESSAGE.RECORD[0].ASSIGNMENTNUMBER.toString() + "," + element.LOCATIONNUMBER.toString() + "," +
element.ITEMNUMBER.toString() + ",,,," +
task.businessData.MESSAGE.RECORD[0].WAVE.toString() + ",N," + element.CARTONNUMBER.toString() + "\n";
console.log(csv);
try {
writeDetail.write(csv);
} catch (err) {
console.error(err);
}
});
});
Edit: Based on the feedback below, I consolidated the write streams into one (the missing line was still happening, but it's better coding anyway). I also added a try block around the JSON parse. Ran the code several times over different files, and no missing line. Maybe the write was happening before the parse was done? In any case, it seems my problem is resolved for the moment. I'll have to research how to properly handle backpressure later. Thanks for the help.
The code you show here is opening two separate writestreams on the same file and then writing to both of them without any timing coordination between them. That will clearly conflict.
You open one here:
const writeHeader = fs.createWriteStream('./confirm.csv');
const header = "ACTION_TYPE,PROCESS_PICK,TYPE_FLAG,APP_ID,FACILITY_ID,CONTAINER_ID,USER_ID,CONFIRM_DATE_TS,PICK_QTY,REMAINING_QTY,PICK_STATUS,ASSIGNMENT_ID,LOCATION_ID,ITEM_ID,CLUSTER_ID,TOTAL_QTY,TOTAL_ITEMS,WAVE_NBR,QA_FLAG,WORK_DIRECTIVE_ID\n";
writeHeader.write(header);
And, you open one here:
// Create write interface to save each csv line
const writeDetail = fs.createWriteStream('./confirm.csv', {
flags: 'a'
});
And, then you write to the second one in your loop. Those clearly conflict. The write from the first is probably not complete when you open the second and it also may not be flushed to disk yet either. The second one opens for append, but doesn't accurately read the file position for appending because the first one hasn't yet succeeded.
This code doesn't show any reason for using separate write streams at all so the cleanest way to address this would be to just use one writestream that will accurately serialize the writes. Otherwise, you have to wait for the first writestream to finish and close before opening the second one.
And, your .forEach() loop needs to have backpressure support since you're repeatedly calling .write() and, at some data size, you can get backpressure. I agree that backpressure is not likely the cause of the issue you are asking about, but is something else you need to fix when rapdily writing in a loop.

NodeJS + Electron - Optimizing Displaying Large Files

I'm trying to read large files. Currently, I'm following the NodeJS documentation on how to read the large files but when I read a somewhat large file (~1.1 MB, ~20k lines), my Electron app freezes up for about 6 minutes and then the app finishes loading all the lines.
Here's my current code
var fileContents = document.getElementById("fileContents")
//first clear out the existing text
fileContents.innerHTML = ""
if(fs.existsSync(pathToFile)){
const fileLine = readline.createInterface({
input: fs.createReadStream(pathToFile)
})
fileLine.on('line', (line) => {
fileContents.innerHTML += line + "\n"
})
} else {
fileContents.innerHTML += fileNotFound + "\n"
console.log('Could not find file!!')
}
And the tag I'm targeting is a <xmp> tag.
What are some ways that people have displayed large files?
Streams can often be useful for high performance as they allow you to process one line at a time without loading the whole file into memory.
In this case however, you are loading each line and then concatenating onto your existing string (fileContents.innerHTML) with +=. All that concatenating is likely to be slower than just loading the whole contents of the file as one string. Worse still, you are outputting HTML every time you read in a line. So with 20k lines you are asking the rendering engine to render HTML 20,000 times!
Instead, try reading in the file as one string, and outputting the HTML just once.
fs.readFile(pathToFile, (err, data) => {
if (err) throw err;
fileContents.innerHTML = data;
});
The problem with fs.readFile() is that you just won't be able to open large files, for instance 600Mb, you need to use stream anyway for very big files.
I'm writing a genomics app called AminoSee using Node and Electron. When I started trying to ingest bigger than 2 GB files I had to switch to streaming architecture as my program was trying to load the entire file into memory. Since I scan the file this is clearly ludicrous. Here is the core of my processor, from CLI app at:
sourced: https://github.com/tomachinz/AminoSee/blob/master/aminosee-cli.js
try {
var readStream = fs.createReadStream(filename).pipe(es.split()).pipe(es.mapSync(function(line){
readStream.pause(); // curious to test performance of removing
streamLineNr++;
processLine(line); // process line here and call readStream.resume() when ready
readStream.resume();
})
.on('error', function(err){
error('While reading file: ' + filename, err.reason);
error(err)
})
.on('end', function() {
log("Stream ending");
})
.on('close', function() {
log("Stream closed");
setImmediate( () => { // after a 2 GB file give the CPU 1 cycle breather!
calcUpdate() ;
saveDocuments();
});
}));
} catch(e) {
error("ERROR:" + e)
}
I used setImmediate a lot as my program would get quite far ahead of itself before I learnt about callbacks and promises! Was a great time to learn about race conditions that for sure. Still has a million bugs would make a good learning project.

fs dump equivalent in NodeJs?

Objective
Forcing fs (and the libraries using it) to write everything to files before terminating application.
Background
I am writing an object to a CSV file using the npm package csv-write-stream.
Once the library is done writing the CSV file, I want to terminate my application using process.exit().
Code
To achieve the aforementioned objective, I have written the following:
let writer = csvWriter({
headers: ['country', 'postalCode']
});
writer.pipe(fs.createWriteStream('myOutputFile.csv'));
//Very big array with a lot of postal code info
let currCountryCodes = [{country: Portugal, postalCode: '2950-286'}, {country: Barcelona, postalCode: '08013'}];
for (let j = 0; j < currCountryCodes.length; j++) {
writer.write(currCountryCodes[j]);
}
writer.end(function() {
console.log('=== CSV written successfully, stopping application ===');
process.exit();
});
Problem
The problem here is that if I execute process.exit(), the library wont have time to write to the file, and the file will be empty.
Since the library uses fs, my solution to this problem, is to force a fs.dump() or something similar in NodeJs, but after searching, I found nothing similar.
Questions
How can I force fs to dump (push) all the content to the file before exiting the application?
If the first option is not possible, is there a way to wait for the application to write and then close it ?
I think your guess is right.
When you call process.exit(), the piped write stream hasn't finished writing yet.
If you really want to terminate your server explicitly, this will do.
let r = fs.createWriteStream('myOutputFile.csv');
writer.pipe(r);
...
writer.end(function() {
r.end(function() {
console.log('=== CSV written successfully, stopping application ===');
process.exit();
});
});

NodeJS sometimes gets killed because out of memory while streaming/piping files

Problem
I have a NodeJS Server with the request module.
I use requests pipe() for serving files.
Sometimes, the app throws an exception, all downloads cancel and I have to restart the app:
Out of memory: Kill process 9342 (nodejs) score 793 or sacrifice child
Killed process 9342 (nodejs) total-vm:1333552kB, anon-rss:410648kB, file-rss:0kB
I wrote another script which restarts the server automatically (with childprocess & fork) when it ends unexpectedly, which sometimes throws this error:
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Server data
RAM: 500MB (I know that this is not much, but it's cheap)
Ubuntu 12.04.5 LTS
NodeJS version: v0.10.36
Assumptions
Too much downloads in parallel
Something wrong with pipe related to the RAM
Regarding 1:
When somebody downloads a big file a bit of it is loaded in the RAM (I know nothing about this, say 20MB at once, please correct me if I'm too wrong). When 400MB is available and 20 current downloads with the same download speed, the server crashes because he can't load more than 400MB at once in the RAM.
Regarding 2:
In addition to pipe() I use the following code to track current & canceled downloads:
req.on("close", function() {
currentDownloads--;
});
The pipe() doesn't close properly and the RAM it used doesn't get cleared.
Questions
If any of my assumptions should be right, how could I fix it?
If not, what could it be or rather where could be the cause (is it NodeJS/request module/is my code wrong or bad, are there better methods)?
Full Code
var currentDownloads = 0;
app.post("/", function (req, res) {
var open = false;
req.on("close", function () {
if (open) {
currentDownloads--;
open = false;
}
});
request.get(url)
.on("error", function (err) {
log("err " + err);
if (open) {
currentDownloads--;
open = false;
}
})
.on("response", function () {
open = true;
currentDownloads++;
})
.pipe(res);
});

Resources