How to download a directory's content via ftp using nodejs? - node.js

So, i am trying to download the contents of a directory via sftp using nodejs, and so far I am getting stuck with an error.
I am using the ssh2-sftp-client npm package and for the most part it works pretty well as i am able to connect to the server and list the files in a particular remote directory.
Using the fastGet method to download a file also works without any hassles, and since all the methods are promise based i assumed i could easily download all the files in the directory simply enough, by doing something like:
let main = async () => {
await sftp.connect(config.sftp);
let data = await sftp.list(config.remote_dir);
if (data.length) data.map(async x => {
await sftp.fastGet(`${config.remote_dir}/${x.name}`, config.base_path + x.name);
});
}
So it turns out the code above successfully downloads the first file, but then crashes with the following error message:
Error: Failed to get sandbox/demo2.txt: The requested operation cannot be performed because there is a file transfer in progress.
This seems to indicate that the promise from fastGet is resolving too early as the file transfer is supposed to be over when the next element of the file list is processed.
I tried to use the more traditional get() instead but it is using streams, and it fails with a different error. After researching it seems there's been a breaking change regarding streams in node 10.x. well in my case calling get simply fails (not even downloading the first file).
Does anyone know a workaround to this? or else, another package that can download several files by sftp?
Thanks!

I figured out, since the issue was concurrent download attempts on one client connection, i could try to manage it with one client per file download. I ended up with the following recursive function.
let getFromFtp = async (arr) => {
if (arr.length == 0) return (processFiles());
let x = arr.shift();
conns.push(new Client());
let idx = conns.length - 1;
await conns[idx].connect(config.sftp.auth);
await conns[idx]
.fastGet(`${config.sftp.remote_dir}/${x.name}`, `${config.dl_dir}${x.name}`);
await connections[idx].end();
getFromFtp(arr);
};
Notes about this function:
The array parameter is a list of files to download, presumably fetched using list() beforehand
conns was declared as an empty array and is used to contain our clients.
using array.prototype.shift(), to gradually deplete the array as we go through the file list
the processFiles() method is fired once all the files were downloaded.
this is just the POC version. of couse we need to add the error management to that.

Related

Unable to use one readable stream to write to two different targets in Node JS

I have a client side app where users can upload an image. I receive this image in my Node JS app as readable data and then manipulate it before saving like this:
uploadPhoto: async (server, request) => {
try {
const randomString = `${uuidv4()}.jpg`;
const stream = Fse.createWriteStream(`${rootUploadPath}/${userId}/${randomString}`);
const resizer = Sharp()
.resize({
width: 450
});
await data.file
.pipe(resizer)
.pipe(stream);
This works fine, and writes the file to the projects local directory. The problem comes when I try to use the same readable data again in the same async function. Please note, all of this code is in a try block.
const stream2 = Fse.createWriteStream(`${rootUploadPath}/${userId}/thumb_${randomString}`);
const resizer2 = Sharp()
.resize({
width: 45
});
await data.file
.pipe(resizer2)
.pipe(stream2);
The second file is written, but when I check the file, it seems corrupted or didn't successfully write the data. The first image is always fine.
I've tried a few things, and found one method that seems to work but I don't understand why. I add this code just before the I create the second write stream:
data.file.on('end', () => {
console.log('There will be no more data.');
});
Putting the code for the second write stream inside the on-end callback block doesn't make a difference, however, if I leave the code outside of the block, between the first write stream code and the second write stream code, then it works, and both files are successfully written.
It doesn't feel right leaving the code the way it is. Is there a better way I can write the second thumb nail image? I've tried to use the Sharp module to read the file after the first write stream writes the data, and then create a smaller version of it, but it doesn't work. The file doesn't ever seem to be ready to use.
You have 2 alternatives, which depends on how your software is designed.
If possible, I would avoid to execute two transform operations on the same stream in the same "context", eg: an API endpoint. I would rather separate those two different tranform so they do not work on the same input stream.
If that is not possible or would require too many changes, the solution is to fork the input stream and the pipe it into two different Writable. I normally use Highland.js fork for these tasks.
Please also see my comments on how to properly handle streams with async/await to check when the write operation is finished.

Running node js export in google cloud function

We need to export a zip file, containing lots of data (a couple of gb). The zip archive needs to contain about 50-100 indesign files (each about 100mb) and some other smaller files. We try to use google cloud functions to achieve it (less costs etc.) The function is triggered via a config file, which is uploaded into a bucket. The config file contains all information which files needs to be put into the zip. Unfortunately the memory limit of 2gb is always reached, so the function never succeeds.
We tried different things:
First solution was to loop over the files, create promises to download them and after the loop is done we tried to resolve all promises at once. (files are downloaded via streaming directly into a file).
Second try was to await every download inside the for loop, but again, memory limit reached.
So my question is:
Why does node js not clear the streams? It seems like node keeps every streamed file in memory and finally crashes. I already tried to set the readStream and writeStream to null as suggested here:
How to prevent memory leaks in node.js?
But no change.
Note: We never reached the point, there all files are downloaded to create the zip file. It always failed after the first files.
See below the code snippets:
// first try via promises all:
const promises = []
for (const file of files) {
promises.push(downloadIndesignToExternal(file, 'xxx', dir));
}
await Promise.all(promises)
// second try via await every step (not performant in terms of execution time, but we wanted to know if memory limit is also reached:
for (const file of files) {
await downloadIndesignToExternal(file, 'xxx', dir);
}
// code to download indesign file
function downloadIndesignToExternal(activeId, externalId, dir) {
return new Promise((resolve, reject) => {
let readStream = storage.bucket(INDESIGN_BUCKET).file(`${activeId}.indd`).createReadStream()
let writeStream = fs.createWriteStream(`${dir}/${externalId}.indd`);
readStream.pipe(writeStream);
writeStream.on('finish', () => {
resolve();
});
writeStream.on('error', (err) => {
reject('Could not write file');
})
})
}
It's important to know that /tmp (os.tmpdir()) is a memory-based filesystem in Cloud Functions. When you download a file to /tmp, it is taking up memory just as if you had saved it to memory in a buffer.
If your function needs more memory than can be configured for a function, then Cloud Functions might not be the best solution to this problem.
If you still want to use Cloud Functions, you will have to find a way to stream the input files directly to the output file, but without saving any intermediate state in the function. I'm sure this is possible, but you will probably need to write a fair amount of extra code for this.
For anyone interested:
We got it working by streaming the files into the zip and streaming it directly into google cloud storage. Memory usage is now by around 150-300mb, so this works perfectly for us.

How to avoid performing a firebase function on folders on cloud storage events

I'm trying to organize assets(images) into folders with a unique id for each asset, the reason being that each asset will have multiple formats (thumbnails, and formats optimized for web and different viewports).
So for every asset that I upload to the folder assets-temp/ is then moved and renamed by the functions into assets/{unique-id}/original{extension}.
example: assets-temp/my-awesome-image.jpg should become assets/489023840984/original.jpg.
note: I also keep track of the files with their original name in the DB and in the original's file metadata.
The issue: The function runs and performs what I want, but it also adds a folder named assets/{uuid}/original/ with nothing in it...
The function:
exports.process_new_assets = functions.storage.object().onFinalize(async (object) => {
// Run this function only for files uploaded to the "assets-temp/" folder.
if (!object.name.startsWith('assets-temp/')) return null;
const file = bucket.file(object.name);
const fileExt = path.extname(object.name);
const destination = bucket.file(`assets/${id}/original${fileExt}`);
const metadata = {
id,
name: object.name.split('/').pop()
};
// Move the file to the new location.
return file.move(destination, {metadata});
});
I am guessing that this might happen if the operation of uploading the original image triggers two separate events: one that creates the directory assets-temp and one that creates the file assets-temp/my-awesome-image.jpg.
If I guessed right, the first operation will trigger your function with a directory object (named "assets-temp/"). This matches your first if, so the code will proceed and do
destination = bucket.file('assets/${id}/original') // fileExt being empty
and then call file.move - this will create assets/id/original/ directory.
Simply improve your 'if' to exclude a file named "assets-temp/".
According to the documentation there is no such thing as folders in cloud storage, however, it is possible to emulate them, like you can do by using the console GUI. When creating folders what really happens is that an empty object is created(zero bytes of space) but its name ends with a forward slash, also folder names can end with _$folder$ but it is my understanding that that is how things worked in older versions so for newer buckets the forward slash is enough.

Where should I put custom errors in sails.js?

I was wondering what's the best practice and if I should create:
a directory in which declare statically all the errors my application uses, like api/errors/custom1Error
declare them directly inside the files
or put the files directly inside the dir that needs that error, like api/controller/error/formInvalidError
other options!?
A neat way of going about this would be to simply add the errors as custom responses under api/responses. This way even the invocation becomes pretty neat. Although the doc says you should add them directly in the responses directory, I'm sure there must be a way to nest them under, say, responses/errors. I'll try that out and post an update in a bit.
Alright, off a quick search, I couldn't find any way to nest the responses, but you can use a small workaround that's not quite as neat:
Create the responses/errors directory with all the custom error response handlers. Create a custom response and name it something like custom.js. Then specify the response name while calling res.custom().
I'm adding a short snippet just for illustration:
api/responses/custom.js:
var customErrors = {
customError1: require('./errors/customError1'),
customError2: require('./errors/customError2')
};
module.exports = function custom (errorName, data) {
var req = this.req;
var res = this.res;
if (customErrors[errorName]) return customErrors[errorName](req, res, data);
else return res.negotiate();
}
From the controller:
res.custom('authError', data);
If you don't need logical processing for different errors, you can do away with the whole errors/ directory and directly invoke the respective views from custom.js:
module.exports = function custom (viewName, data) {
var req = this.req;
var res = this.res;
return res.view('errors/' + viewName, data);//assuming you have error views in views/errors
}
(You should first check if the view exists. Find out how on the linked page.)
Although I'm using something like this for certain purposes (dividing routes and so on), there definitely should be a way to include response handlers defined in different directories. (Perhaps by reconfiguring some grunt task?) I'll try to find that out and update if I find any success.
Good luck!
Update
Okay, so I found that the responses hook adds all files to res without checking if they are directories. So adding a directory under responses results in a TypeError from lodash. I may be reading this wrong but I guess it's reasonable to conclude that currently it's not possible to add a directory there, so I guess you'll have to stick to one of the above solutions.

Express res.download() not actually downloading file

I'm attempting to return generated files to the front end through Express' res.download function. I'm using chrome, but whenever I call that API that executes the following code all that is returned is the same values returned from the Express res.sendFile() function.
I know that res.download uses res.sendFile, but I would like the download function to actually save to the file system instead of just returning the file in the body of the response.
This is my code.
exports.download = function(req,res) {
var filePath = //somefile that I want to download
res.download(filePath, 'response.txt', function(err) {
throw err;
}
}
I know that the above code at least partly works because I'm getting back, in the response, the contents of the file. However, I want it to be saved onto the file system.
Am I misunderstanding what the download function is supposed to do? Do I just need to take the response data and write it to the file system manually?
res.download adds headers that suggest to the browser that the file should be downloaded rather than opened. However, there's no way to force the browser to do this; it's ultimately the user's choice whether to download a particular file, typically.
If you're triggering this request with AJAX, well, that's not going to cause it to be downloaded, because your JavaScript is requesting that it get the data.
Do I just need to take the response data and write it to the file system manually?
You don't have file system access in browser-side JavaScript. I'm not sure how you intend to do this.

Resources