Stream file through SFTP connection to client

Stream file through SFTP connection to client - node.js

I'm writing a Node Express server which connects via sftp to a file store. I'm using the ssh2-sftp-client package.
To retrieve files it has a get function with the following signature:
get(srcPath, dst, options)
The dst argument should either be a string or a writable stream, which will be used as the destination for a stream pipe.
I would like to avoid creating a file object at my server and instead transfer the file onto my client to save memory consumption as described in this article. I tried to accomplish this with the following code:
const get = (writeStream) => {
sftp.connect(config).then(() => {
return sftp.get('path/to/file.zip', writeStream)
});
};
app.get('/thefile', (req, res) => {
get(res); // pass the res writable stream to sftp.get
});
However, this causes my node server to crash due to a unhandled promise rejection. Is what I am attempting possible? Should I store the file on my server machine first before sending to the client? I've checked the documentation/examples for the sftp package in question, but cannot find an example of what I am looking for.

I found the error, and it's a dumb one on my part. I was forgetting to end the sftp connection. When this method was called a second time it was throwing the exception when it tried to connect again. If anyone finds themselves in the same situation remember to end the connection once you're finished with it like this:
const get = (writeStream) => {
sftp.connect(config).then(() => {
return sftp.get('path/to/file.zip', writeStream);
}).then(response => {
sftp.end();
resolve(response);
});
};

Related

busboy wait for field data before accepting file upload

Is something like this even possible, or are there better ways to do this? Is what Im doing even a good idea, or is this a bad approach?
What I want to do is upload a file to my nodejs server. Along with the file I want to send some meta data. The meta data will determine if the file can be saved and the upload accepted, or if it should be rejected and sending a 403 response.
I am using busboy and I am sending FormData from my client side.
The example below is very much simplified:
Here is a snippet of the client side code.
I am appending the file as well as the meta data to the form
const formData = new FormData();
formData.append('name', JSON.stringify({name: "John Doe"}));
formData.append('file', this.selectedFile, this.selectedFile.name);
Here is the nodejs side:
exports.Upload = async (req, res) => {
try {
var acceptUpload = false;
const bb = busboy({ headers: req.headers });
bb.on('field', (fieldname, val) => {
//Verify data here before accepting file upload
var data = JSON.parse(val);
if (val.name === 'John Doe') {
acceptUpload = true;
} else {
acceptUpload = false;
}
});
bb.on('file', (fieldname, file, filename, encoding, mimetype) => {
if (acceptUpload) {
const saveTo = '/upload/file.txt'
file.pipe(fs.createWriteStream(saveTo));
}else{
response = {
message: 'Not Authorized'
}
res.status(403).json(response);
}
});
bb.on('finish', () => {
response = {
message: 'Upload Successful'
}
res.status(200).json(response);
});
req.pipe(bb);
} catch (error) {
console.log(error)
response = {
message: error.message
}
res.status(500).json(response);
}
}
So basically, is it even possible for the 'field' event-handler to wait for the 'file' event handler? How could one verify some meta data before accepting a file upload?
How can I do validation of all data in the form data object, before accepting the file upload? Is this even possible, or are there other ways of uploading files with this kind of behaviour? I am considering even adding data to the request header, but this does not seem like the ideal solution.
Update
As I suspected, nothing is waiting. Which ever way I try, the upload first has to be completed, only then after is it rejected with a 403
Another Update
Ive tried the same thing with multer and have similar results. Even when I can do the validation, the file is completely uploaded from the client side. Once the upload is complete, only then the request is rejected. The file, however, never gets stored, even though it is uploaded in its entirety.

With busboy, nothing is written to the server if you do not execute the statement file.pipe(fs.createWriteStream(saveTo));
You can prevent more data from even being uploaded to the server by executing the statement req.destroy() in the .on("field", ...) or the .on("file", ...) event handler, even after you have already evaluated some of the fields. Note however, that req.destroy() destroys not only the current HTTP request but the entire TCP connection, which might otherwise have been reused for subsequent HTTP requests. (This applies to HTTP/1.1, in HTTP/2 the relationship between connections and requests is different.)
At any rate, it has no effect on the current HTTP request if everything has already been uploaded. Therefore, whether this saves any network traffic depends on the size of the file. And if the decision whether to req.destroy() involves an asynchronous operation, such as a database lookup, then it may also come too late.
Compare
> curl -v -F name=XXX -F file=#<small file> http://your.server
* We are completely uploaded and fine
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
with
> curl -v -F name=XXX -F file=#<large file> http://your.server
> Expect: 100-continue
< HTTP/1.1 100 Continue
* Send failure: Connection was reset
* Closing connection 0
curl: (55) Send failure: Connection was reset
Note that the client sets the Expect header before uploading a large file. You can use that fact in connection with a special request header name in order to block the upload completely:
http.createServer(app)
.on("checkContinue", function(req, res) {
if (req.headers["name"] === "John Doe") {
res.writeContinue(); // sends HTTP/1.1 100 Continue
app(req, res);
} else {
res.statusCode = 403;
res.end("Not authorized");
}
})
.listen(...);
But for small files, which are uploaded without the Expect request header, you still need to check the name header in the app itself.

Pipe data from Writable to Readable

I am working on getting files from an SFTP server and piping the data to Box.com using their sdk. The Box sdk takes a readable stream as as parameter for uploading a file. The code that I have written to fetch the files from the sftp server uses the npm module ssh2-sftp-client.
The issue I am having is that a writable stream is "the end of the line" with streams unless you are using something like a Transform which is a Duplex and implements both read and write. Below is the code that I am using. Because I am working on this for a client I am intentionally leaving out some stuff that is not necessary.
Below is the method on the sftp class
async getFile(filepath: string): Promise<Readable> {
logger.info(`Fetching file: ${filepath}`);
const writable = new Writable();
const stream = new PassThrough();
await this.client.get(filepath, writable);
return writable.pipe(stream);
}
The implementation of getting a file and attempting to pipe to box which is an instance of an authorized BoxSDK client.
try {
for (const filename of filenames) {
const stream: Readable = await tmsClient.getFile(
'redacted' + filename,
);
logger.info(`Piping ${filename} to Box...`);
await box.createFile(filename, 'redacted', stream);
logger.info(`${filename} successfully downloaded`);
}
} catch (error) {
logger.error(`Failed to move files: ${error}`);
}
I am not super well versed in streams but based on my research I think this should work in theory.
I have also tried this implementation where the ssh client returns a buffer and then I try and pipe that buffer as a readable stream. With this implementation though I keep getting errors from the Box sdk that the stream ended unexpectedly.
async getFile(filepath: string): Promise<Readable> {
logger.info(`Fetching file: ${filepath}`);
const stream = new Readable();
const buffer = (await this.client.get(filepath)) as Buffer;
stream._read = (): void => {
stream.push(buffer);
stream.push(null);
};
return stream;
}
And the error message: 2020-02-06 15:24:57 error: Failed to move files: Error: Unexpected API Response [400 Bad Request] bad_request - Stream ended unexpectedly.
Any insight is greatly appreciated!

So after doing some more research into this it turns out that the issue is actually with the Box sdk for Node. The sdk is terminating the body of the stream before it is actually done. This is because under the hood they are using the request library which requires a content-length header to send large payloads. Without that in place it will continue to terminate the stream before the payload is sent.
On the Box community forum they suggest adding properties to the stream prototype to pass stuff to the underlying request library. I STRONGLY disagree with this because it is not the correct way to go about it. The Box sdk needs to provide a way to pass in the length of the content in Bytes. As the user of their API I should not have to manipulate their underlying dependencies. I am going to open an issue with their sdk and hopefully get this fixed.
Hope this is useful to someone else!

Streaming a zip download from cloud functions

I have a firebase cloud function that uses express to streams a zip file of images to the client. When I test the cloud function locally it works fine. When I upload to firebase I get this error:
Error: Can't set headers after they are sent.
What could be causing this error? Memory limit?
export const zipFiles = async(name, params, response) => {
const zip = archiver('zip', {zlib: { level: 9 }});
const [files] = await storage.bucket(bucketName).getFiles({prefix:`${params.agent}/${params.id}/deliverables`});
if(files.length){
response.attachment(`${name}.zip`);
response.setHeader('Content-Type', 'application/zip');
response.setHeader('Access-Control-Allow-Origin', '*')
zip.pipe(output);
response.on('close', function() {
return output.send('OK').end(); // <--this is the line that fails
});
files.forEach((file, i) => {
const reader = storage.bucket(bucketName).file(file.name).createReadStream();
zip.append(reader, {name: `${name}-${i+1}.jpg`});
});
zip.finalize();
}else{
output.status(404).send('Not Found');
}

What Frank said in comments is true. You need to decide all your headers, including the HTTP status response, before you start sending any of the content body.
If you intend to express that you're sending a successful response, simply say output.status(200) in the same way that you did for your 404 error. Do that up front. When you're piping a response, you don't need to do anything to close the response in the end. When the pipe is done, the response will automatically be flushed and finalized. You're only supposed to call end() when you want to bail out early without sending a response at all.
Bear in mind that Cloud Functions only supports a maximum payload of 10MB (read more about limits), so if you're trying to zip up more than that total, it won't work. In fact, there is no "streaming" or chunked responses at all. The entire payload is being built in memory and transferred out as a unit.

GCP Cloud Function reading files from Cloud Storage

I'm new to GCP, Cloud Functions and NodeJS ecosystem. Any pointers would be very helpful.
I want to write a GCP Cloud Function that does following:
Read contents of file (sample.txt) saved in Google Cloud Storage.
Copy it to local file system (or just console.log() it)
Run this code using functions-emulator locally for testing
Result: 500 INTERNAL error with message 'function crashed'. Function logs give following message
2019-01-21T20:24:45.647Z - info: User function triggered, starting execution
2019-01-21T20:24:46.066Z - info: Execution took 861 ms, finished with status: 'crash'
Below is my code, picked mostly from GCP NodeJS sample code and documentation.
exports.list_files = (req, res) => {
const fs = require('fs');
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
const bucket = storage.bucket('curl-tests');
bucket.setUserProject("cf-nodejs");
const file = bucket.file('sample.txt'); // file has couple of lines of text
const localFilename = '/Users/<username>/sample_copy.txt';
file.createReadStream()
.on('error', function (err) { })
.on('response', function (response) {
// Server connected and responded with the specified status and
headers.
})
.on('end', function () {
// The file is fully downloaded.
})
.pipe(fs.createWriteStream(localFilename));
}
I run like this:
functions call list_files --trigger-http
ExecutionId: 4a722196-d94d-43c8-9151-498a9bb26997
Error: { error:
{ code: 500,
status: 'INTERNAL',
message: 'function crashed',
errors: [ 'socket hang up' ] } }
Eventually, I want to have certificates and keys saved in Storage buckets and use them to authenticate with a service outside of GCP. This is the bigger problem I'm trying to solve. But for now, focusing on resolving the crash.

Start your development and debugging on your desktop using node and not an emulator. Once you have your code working without warnings and errors, then start working with the emulator and then finally with Cloud Functions.
Lets' take your code and fix parts of it.
bucket.setUserProject("cf-nodejs");
I doubt that your project is cf-nodejs. Enter the correct project ID.
const localFilename = '/Users/<username>/sample_copy.txt';
This won't work. You do not have the directory /Users/<username> in cloud functions. The only directory that you can write to is /tmp. For testing purposes change this line to:
const localFilename = '/tmp/sample_copy.txt';
You are not doing anything for errors:
.on('error', function (err) { })
Change this line to at least print something:
.on('error', function (err) { console.log(err); })
You will then be able to view the output in Google Cloud Console -> Stack Driver -> Logs. Stack Driver supports select "Cloud Functions" - "Your function name" so that you can see your debug output.
Last tip, wrap your code in a try/except block and console.log the error message in the except block. This way you will at least have a log entry when your program crashes in the cloud.

Node spawn process

Unable to find out the issue in following script, what i want to achieve with the script is to have a node log server that would listen to post requests with log title and log details as query parameters, write to a file and then throw back as json on get request.
Problem:
It constantly shows loader sometime and gives the required log sometime.
Note:
The process spawning is done to update the browser during the logging, if someone has better solution, plz suggest
Post Call:
http://127.0.0.1:8081/log?title="test"&detail="test detail"
Code:
var express = require("express");
var spawn = require('child_process').spawn;
var fs = require("fs");
var srv = express();
var outputFilename = '/tmp/my.json';
function getParamsObject(context) {
var params = {};
for (var propt_params in context.params) {
params[propt_params] = context.params[propt_params];
//define(params, propt_params, context.params[propt_params]);
}
for (var propt_body in context.body) {
params[propt_body] = context.body[propt_body];
//define(params, propt_body, context.body[propt_body]);
}
for (var propt_query in context.query) {
params[propt_query] = context.query[propt_query];
//define(params, propt_query, context.query[propt_query]);
}
return params;
}
srv.get("/", function(req, res) {
res.send("Hello World From Index\n");
});
srv.get("/Main", function(req, res) {
res.send("Hello World From Main\n");
});
srv.get("/ReadFile", function(req, res) {
fs.readFile("example_one.txt", function(err, data) {
if(err) throw err;
res.send(data.toString());
});
});
srv.get("/ReadFileJSON", function(req, res) {
fs.readFile("example_one.txt", function(err, data) {
if(err) throw err;
res.setHeader("content-type", "application/json");
res.send(new Parser().parse(data.toString()));
});
});
srv.post("/log", function(req, res) {
var input = getParamsObject(req);
if(input.detail) {
var myData = {
Date: (new Date()).toString(),
Title: input.title,
Detail: input.detail
}
fs.writeFile(outputFilename, JSON.stringify(myData, null, 4), function(err) {
if(err) {
console.log(err);
}
});
}
res.setHeader("content-type", "application/json");
res.send({message:"Saved"});
});
srv.get("/log", function(req, res) {
var child = spawn('tail', ['-f', outputFilename]);
child.stdout.pipe(res);
res.on('end', function() {
child.kill();
});
});
srv.listen(8081);
console.log('Server running on port 8081.');

To clarify the question...
You want some requests to write to a log file.
You want to effectively do a log tail over HTTP, and are currently doing that by spawning tail in a child process.
This isn't all that effective.
Problem: It constantly shows loader sometime and gives the required log sometime.
Web browsers buffer data. You're sending the data, sure, but the browser isn't going to display it until a minimum buffer size is reached. And then, there are rules for what will display when all the markup (or just text in this case) hasn't loaded yet. Basically, you can't stream a response to the client and reliably expect the client to do anything with it until it is done streaming. Since you're tailing a log, that puts you in a bad predicament.
What you must do is find a different way to send that data to the client. This is a good candidate for web sockets. You can create a persistent connection between the client and the server and then handle the data immediately rather than worrying about a client buffer. Since you are using Node.js already, I suggest looking into Socket.IO as it provides a quick way to get up and running with web sockets, and long-polling JSON (among others) as a fallback in case web sockets aren't available on the current browser.
Next, there is no need to spawn another process to read a file in the same way tail does. As Trott has pointed out, there is an NPM package for doing exactly what you need: https://github.com/lucagrulla/node-tail Just set up an event handler for the line event, and then fire a line event on the web socket so that your JavaScript client receives it and displays it to the user immediately.

There are a couple of things that seem to stand out as unnecessary complications that may be the source of your problem.
First, the spawn seems unnecessary. It appears you want to open a file for reading and get updated any time something gets added to the file. You can do this in Node with fs.watch(), fs.watchFile(), or the node-tail module. This may be more robust than using spawn() to create a child process.
Second (and less likely to be the source of the problem, I think), you seem to be using query string parameters on a POST request. While not invalid, this is unusual. Usually, if you are using the POST method, you send the data via post, as part of the body of the request. If using the GET method, data is sent as a query string. If you are not using the body to send data, switch to GET.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string