Download file from s3 without write it to file system in nodejs - node.js

I have a Nodejs server running with Hapi.
one of the job of the server is to send files to servicer API (the API only accept streams when I send buffer it return an error) on the user ask
All the files are stored in s3.
When I download them if I'm using promise(),
I get in the body buffer.
And I can get passthrough if I'm using createReadStream().
My problem is when I try to convert the buffer to stream and send it the API reject it, and the same when I use the createReadStream() result,
but when I use FS to save the file and then FS to read the API accept the stream and its work.
so I need help how can I create the same result without saving and reading the file.
edit:
here is my code I know it's the wrong way but it works I need a better way that will work
static async downloadFile(Bucket, Key) {
const result = await s3Client
.getObject({
Bucket,
Key
})
.promise();
fs.writeFileSync(`${Path.basename(Key)}`,result.Body);
const file = await fs.createReadStream(`${Path.basename(Key)}`);
return file;
}

If I understand it correctly, you want to get the object from the s3 bucket and stream to your HTTP response as the stream.
Instead of getting the data in the buffers and than figuring out the way to convert it to stream can be complicated and has its limitations, if you really want to leverage the power of streams then don't try to convert it to buffer and load the entire object to the memory, you can create a request that streams the returned data directly to a Node.js Stream object by calling the createReadStream method on the request.
Calling createReadStream returns the raw HTTP stream managed by the request. The raw data stream can then be piped into any Node.js Stream object.
This technique is useful for service calls that return raw data in their payload, such as calling getObject on an Amazon S3 service object to stream data directly into a file, as shown in this example.
//I Imagine you have something similar.
server.get ('/image', (req, res) => {
let s3 = new AWS.S3({apiVersion: '2006-03-01'});
let params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
let readStream= s3.getObject(params).createReadStream();
// When the stream is done being read, end the response
readStream.on('close', () => {
res.end()
})
readStream.pipe(res);
});
When you stream data from a request using createReadStream, only the raw HTTP data is returned. The SDK does not post-process the data, this raw HTTP data can be directly returned.
Note:
Because Node.js is unable to rewind most streams, if the request initially succeeds, then retry logic is disabled for the rest of the response. In the event of a socket failure, while streaming, the SDK won't attempt to retry or send more data to the stream. Your application logic needs to identify such streaming failures and handle them.
Edits:
After the edits on the original question, I can see that s3 sends a PassThrough stream object which is different from a FileStream in Nodejs. So to get around the problem use the memory (If your files are not very big and or you have enough memory).
Use the package memfs, it will replace the native fs in your app
https://www.npmjs.com/package/memfs
Install the package by npm install memfs and require as follows:
const {fs} = require('memfs');
and your code will look like
static async downloadFile(Bucket, Key) {
const result = await s3
.getObject({
Bucket,
Key
})
.promise();
fs.writeFileSync(`/${Key}`,result.Body);
const file = await fs.createReadStream(`/${Key}`);
return file;
}
Note that the only change I have made in your functions is that I have changed the path ${Path.basename(Key)} to /${Key}, because now you don't need to know the path of your original filesystem we are storing files in memory. I have tested and this solution works

Related

How to deal with ECONNRESET error in Supabase

I have a remix action function that accepts a file as a formData request object and then uploads it to supabase. After that, I get the URL of the uploaded image and return it.
My function:
const fileExt = filename.split(".").pop();
const fileName = `${Math.random().toFixed(10)}.${fileExt}`;
const filePath = `${fileName}`;
const { error: uploadError } = await supabaseClient.storage
.from("public")
.upload(`misc/${filePath}`, stream);
if (uploadError) {
console.error(uploadError);
throw new Error(uploadError.message);
}
const { publicURL, error } = await supabaseClient.storage
.from("public")
.getPublicUrl(`misc/${filePath}`);
if (error) {
console.error(error);
throw new Error(error.message);
}
!publicURL && console.error(`No public URL for ${filePath}`);
return publicURL;
Because the formData is a multipart/form-data, I need to parse it which I handled by throwing the code above in an uploadHandler function and then:
const formData = await parseMultipartFormData(
request,
uploadHandler
);
The code works and at other times, it fails with an error:ECONNRESET, from what I understand, that may have to do with node asynchronous code but I have not been able to solve it. How would I be able to avoid those random ECONNRESET errors that Supabase keeps giving?
I've been through this with file uploads and Remix. I think while everyone's use-case is different, in many cases, where it's an authenticated user uploading to a service like Supabase storage or Cloudflare, it's better to upload from the client. This is especially true if we are using a serverless env. With cloudflare I grab a unique signed upload url with useFetcher(). With Supabase the team have setup their JS library to authenticate the user and we write policies on the database to protect the upload. So it's just a lot easier to use the Client. This becomes even more relevant where we're uploading large video files for example and we want to use the ability to pause and resume uploading. It's much easier from the client than via isolated serverless functions. If we're worried about data from the client, we can put sensitive data encrypted in a cookie so when the client completes it (eg) sends {completed: true} to an Action which grabs the data from the cookie and persists it to the database
I'm sure this doesn't solve your problem, like you really need to do it via the backend, but I just wanted to share that in my experience it's not always the best idea to do everything The Remix Way. Sometimes the client is better.

what the best way to upload larger files to s3 with nodejs aws-sdk? MultipartUpload vs ManagedUpload vs getSignedURL, etc

Im trying to look over the ways AWS has to offer in order to upload files to s3. When I looked into their docs it confused the hell of out me. Looking up to the various resources I came to know a bit more resources like s3.upload vs s3.putObject and others realised there are physical limitations in API gateway and using lambda function to upload a file.
Particularly in case of uploading large file like 1-100 GB AWS suggests multiple methods to upload file to s3. Amongst them are createMultipartUpload, ManagedUpload, getSignedURL and tons of other.
So my Question is:
What is the best and the easiest way to upload large files to s3 where I also can cancel the upload process. The multipart upload seems to tedious.
There's no Best Way to upload file to S3
It depends on what you want especially what are the sizes of the object that you want to upload.
putObject - Ideal for objects which are under 20MB
Presigned Url - Allows you to bypass API Gateway and Put object under 5GB to s3 bucket
Multipart Upload - Allows you to upload files in chunks which means you can continue your upload even the connection went off temporarily. The maximum file size you can upload via this method is 5TB.
Use Streams to upload to S3, this way the Node.JS server doesn't take too much of the resources.
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const stream = require('stream');
function upload(S3) {
let pass = new stream.PassThrough();
let params = {
Bucket: BUCKET,
Key: KEY,
Body: pass
};
S3.upload(params, function (error, data) {
console.error(error);
console.info(data);
});
return pass;
}
const readStream = fs.createReadStream('/path/to/your/file');
readStream.pipe(upload(S3));
This is via streaming local file, the stream can be from request as well.
If want to listen to the progress can use ManagedUpload
const manager = S3.upload(params);
manager.on('httpUploadProgress', (progress) => {
console.log('progress', progress)
// { loaded: 6472, total: 345486, part: 3, key: 'large-file.dat' }
});

Streaming upload from NodeJS to Dropbox

Our system needs to use out internal security checks when interacting with dropbox, we can therefore not use the clientside SDK for Dropbox.
We would rather upload to our own endpoint, apply security checks, and then stream the incoming request to dropbox.
I am coming up short here as there was an older NodeJS Dropbox SDK which supported pipes, but the new SDK does not.
Old SDK:
https://www.npmjs.com/package/dropbox-node
We want to take the incoming upload request and forward it to dropbox as it comes in. (and thus prevent the upload from taking twice as long if we first upload the entire thing to our server and then upload to dropbox)
Is there any way to solve this?
My Dropbox NPM module (dropbox-v2-api) supports streaming. It's based on HTTP API, so you can take an advantage of streams. Example? I see it this way:
const contentStream = fs.createReadStream('file.txt');
const securityChecks = ... //your security checks
const uploadStream = dropbox({
resource: 'files/upload',
parameters: { path: '/target/file/path' }
}, (err, result, response) => {
//upload finished
});
contentStream
.pipe(securityChecks)
.pipe(uploadStream);
Full stream support example here.

Download Video Streams from Remote url using NightmareJs

I am trying to build a scraper to download video streams and and save them in a private cloud instance using NightMareJs (http://www.nightmarejs.org/)
I have seen the documentation and it shows how to download simple files like this -
.evaluate(function ev(){
var el = document.querySelector("[href*='nrc_20141124.epub']");
var xhr = new XMLHttpRequest();
xhr.open("GET", el.href, false);
xhr.overrideMimeType("text/plain; charset=x-user-defined");
xhr.send();
return xhr.responseText;
}, function cb(data){
var fs = require("fs");
fs.writeFileSync("book.epub", data, "binary");
})
-- based on the SO post here -> Download a file using Nightmare
But I want to download video streams using NodeJs async streams api. Is there a way to open a stream from a remote url and pipe it to local / other remote writable stream using NodeJs inbuilt stream apis
You can check if the server sends the "Accept-Ranges" (14.5) and "Content-Length" (14.13) headers through a HEAD request to that file, then request smaller chunks of the file you're trying to download using the "Content-Range" (14.16) header and write each chunk to the target file (you can use appending mode in order to reduce management of the file stream).
Of course, this will be quite slow if you're requesting very small chunks sequentially. You could build a pool of requestors (e.g. 4) and only write the next correct chunk to the file (so the other requestors would not take on future chunks if they are already done downloading).

What "streams and pipe-capable" means in pkgcloud in NodeJS

My issue is to get image uploading to amazon working.
I was looking for a solution that doesnt save the file on the server and then upload it to Amazon.
Googling I found pkgcloud and on the README.md it says:
Special attention has been paid so that methods are streams and
pipe-capable.
Can someone explain what that means and if it is what I am looking for?
Yupp, that means you've found the right kind of s3 library.
What it means is that this library exposes "streams". Here is the API that defines a stream: http://nodejs.org/api/stream.html
Using node's stream interface, you can pipe any readable stream (in this case the POST's body) to any writable stream (in this case the S3 upload).
Here is an example of how to pipe a file upload directly to another kind of library that supports streams: How to handle POSTed files in Express.js without doing a disk write
EDIT: Here is an example
var pkgcloud = require('pkgcloud'),
fs = require('fs');
var s3client = pkgcloud.storage.createClient({ /* ... */ });
app.post('/upload', function(req, res) {
var s3upload = s3client.upload({
container: 'a-container',
remote: 'remote-file-name.txt'
})
// pipe the image data directly to S3
req.pipe(s3upload);
});
EDIT: To finish answering the questions that came up in the chat:
req.end() will automatically call s3upload.end() thanks to stream magic. If the OP wants to do anything else on req's end, he can do so easily: req.on('end', res.send("done!"))

Resources