upload to S3 bucket - node.js

I am new to Elastic Beanstalk, just uploaded an NodeJs app.
I'm wondering if it's possible to "link" (like unix symlinks) a folder to a S3 bucket?
Make "/recordings" points to S3:
var filename = 'recordings/' + match[1] + '.wav';
var file = fs.createWriteStream(filename);
var request = https.get(url, function(response) {
response.pipe(file);
file.on('finish', function() {
file.close();
}).on('error', function(err) {
fs.unlink(file);
console.log('error downloading recording');
});

You can use tools like s3fs-fuse to mount S3 buckets to your filesystem. However, this is generally not recommended as S3 is not designed to be used as a block storage device.
As the s3fs readme documents:
Generally S3 cannot offer the same performance or semantics as a local
file system. More specifically:
random writes or appends to files require rewriting the entire file
metadata operations such as listing directories have poor performance
due to network latency
eventual consistency can temporarily yield
stale data no atomic renames of files or directories
no coordination
between multiple clients mounting the same bucket
no hard links
The best way to use S3 with your Node application is using the AWS SDK for JavaScript in Node.js.

Related

Should I use GridFS or some other method to create a file sharing app?

I am currently beginning work on a file sharing app for my company. A simple form to upload a file, the user is then given a download URL and can past that on to anyone so they can download the file (Similar to products such as WeTransfer).
However I am struggling on decided how to do it. I have been playing with MongoDB and GridFS. I have successfully used multer and multer-gridfs-storage to upload files directly into my database. I am struggling to get them to download as I don't know that much about GridFS.
const storage = new GridFsStorage({
url: 'mongodb://localhost:27017/fileUpload',
file: (req, file) => {
return new Promise((resolve, reject) => {
crypto.randomBytes(16, (err, buf) => {
if (err) {
return reject(err)
}
const filename = buf.toString('hex') + path.extname(file.originalname);
const fileInfo = {
filename: filename,
bucketName: 'uploads'
};
resolve(fileInfo)
});
});
}
});
const upload = multer({ storage })
But it got me thinking is this the best way of doing this or would there be a better why or serving those download files (to download to a users computer).
Any advice is greatly appreciated!
GridFS is a specification for storing and retrieving files that exceed the BSON document size limit of 16 MB. GridFS is a convention implemented by all MongoDB drivers that stores binary data across many smaller documents. The binaries are split into the chunks and then the chunks are stored in collections created by GridFS.
Having said that, given the presented use cases i would highly recommend using media server for storage as given the application landscape, that makes a more economical, viable and scalable solution.
Having said that, I would generally, avoid putting BLOBs in the database if there are other storage options that cost less as using a database as BLOB store is generally not a cost optimised solution.
Sure, there are valid reasons for storing blobs in the database, but given the application’s use case (it being media intensive), use the media server for file storage, and databases for data structures.
In such cases, It is often easy to get "cost unoptimized" with time. Plus the database size would grow exponentially with time, bringing it's own challenges with RAM (WiredTiger Cache) management.
All in all - if it was me - I would use media storage for BLOB intensive applications than relying on databases.

Read JSON file directly from google storage (using Cloud Functions)

I created a function that extracts a specific attribute from a JSON file, but this file was together with the function in Cloud Functions. In this case, I was simply attaching the file and was able to refer to a specific attribute:
const jsonData = require('./data.json');
const result = jsonData.responses[0].fullTextAnnotation.text;
return result;
Ultimately, I want to read this file directly from cloud storage and here I have tried several solutions, but without success. How can I read a JSON file directly from google storage so that, as in the first case, I can read its attributes correctly?
As mentioned in the comment the Cloud Storage API allows you to do many things through API. Here's an example from documentation on how to download a file from Cloud Storage for your reference.
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';
// The ID of your GCS file
// const fileName = 'your-file-name';
// The path to which the file should be downloaded
// const destFileName = '/local/path/to/file.txt';
// Imports the Google Cloud client library
const {Storage} = require('#google-cloud/storage');
// Creates a client
const storage = new Storage();
async function downloadFile() {
const options = {
destination: destFileName,
};
// Downloads the file
await storage.bucket(bucketName).file(fileName).download(options);
console.log(
`gs://${bucketName}/${fileName} downloaded to ${destFileName}.`
);
}
downloadFile().catch(console.error);
To clearly answer the question: you can't!
You need to download the file locally first, and then process it. You can't read it directly from GCS.
With Cloud Functions you can only store file in the /tmp directory, it's the only one writable. In addition, it's an in-memory file system, that means several things:
The size is limited by the memory set up to the Cloud Function. The memory space is shared between your app memory footprint and your file storage in /tmp (you won't be able to download a file of 10Gb for example)
The memory is lost when the instance goes down and
All the Cloud Functions instances have their own memory space. You can't share the files between all the Cloud Functions
The /tmp directory isn't cleaned between 2 functions invocation (on the same instance). Think to cleanup yourselves this directory.

what the best way to upload larger files to s3 with nodejs aws-sdk? MultipartUpload vs ManagedUpload vs getSignedURL, etc

Im trying to look over the ways AWS has to offer in order to upload files to s3. When I looked into their docs it confused the hell of out me. Looking up to the various resources I came to know a bit more resources like s3.upload vs s3.putObject and others realised there are physical limitations in API gateway and using lambda function to upload a file.
Particularly in case of uploading large file like 1-100 GB AWS suggests multiple methods to upload file to s3. Amongst them are createMultipartUpload, ManagedUpload, getSignedURL and tons of other.
So my Question is:
What is the best and the easiest way to upload large files to s3 where I also can cancel the upload process. The multipart upload seems to tedious.
There's no Best Way to upload file to S3
It depends on what you want especially what are the sizes of the object that you want to upload.
putObject - Ideal for objects which are under 20MB
Presigned Url - Allows you to bypass API Gateway and Put object under 5GB to s3 bucket
Multipart Upload - Allows you to upload files in chunks which means you can continue your upload even the connection went off temporarily. The maximum file size you can upload via this method is 5TB.
Use Streams to upload to S3, this way the Node.JS server doesn't take too much of the resources.
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const stream = require('stream');
function upload(S3) {
let pass = new stream.PassThrough();
let params = {
Bucket: BUCKET,
Key: KEY,
Body: pass
};
S3.upload(params, function (error, data) {
console.error(error);
console.info(data);
});
return pass;
}
const readStream = fs.createReadStream('/path/to/your/file');
readStream.pipe(upload(S3));
This is via streaming local file, the stream can be from request as well.
If want to listen to the progress can use ManagedUpload
const manager = S3.upload(params);
manager.on('httpUploadProgress', (progress) => {
console.log('progress', progress)
// { loaded: 6472, total: 345486, part: 3, key: 'large-file.dat' }
});

Preferred method of downloading large files from AWS S3 to EC2 server

I'm having some intermittent problems downloading a largeish (3.5GB) file from S3 to an EC2 instance. about 95% of the time, it works great, and fast - maybe 30 seconds. However, that 5% of the time it stalls out and can take > 2 hours to download. Restarting the job normally solves this problem - indicating that the problem is transient. This is making me think there is a problem with how I'm downloading files. Below is my implementation - I pipe the read stream into a write stream to disk and return a promise which resolves when it is done (or rejects on error).
Is this the preferred method of downloading large files from S3 with node.js? Are there any "gotchas" I should know about?
function getDownloadStream(Bucket, Key) {
return s3
.getObject({
Bucket,
Key
})
.on('error', (error) => {
console.error(error);
return Promise.reject(`S3 Download Error: ${error}`);
})
.createReadStream();
}
function downloadFile(inputBucket, key, destination) {
return new Promise(function(resolve, reject){
getDownloadStream(inputBucket, key)
.on('end', () => {
resolve(destination);
})
.on('error', reject)
.pipe(fs.createWriteStream(destination));
});
}
By default traffic to s3 goes through internet so download speed can become unpredictable. To increase the download speed and for security reasons you can configure aws endpoints, which is a virtual device and it can be used for routing the traffic between your instance to s3 through their internal network(much faster) than going through internet.
While creating endpoint service for s3, you need select route tables of the instances where the app is hosted. after creating you will see a entry in those route tables like destination (com.amazonaws.us-east-1.s3) -> target vpce-xxxxxx, so when ever traffic goes to s3 it is routed through the endpoint instead of going through internet.
Alternatively you can also try parallelising the download like downloading range of bytes in parallel and combine it but for 3.5GB above approach should be fine.

Streaming files directly to Client from Amazon S3 (Node.js)

I am using sails.js and am trying to stream files from the Amazon s3 server directly to the client.
To connect to S3, I use the s3 Module : https://www.npmjs.org/package/s3
This module provides capabilities like client.downloadFile(params) and client.downloadBuffer(s3Params).
My current code looks like the following:
var view = client.downloadBuffer(params);
view.on('error', function(err) {
cb({success: 0, message: 'Could not open file.'}, null);
});
view.on('end', function(buffer) {
cb(null, buffer);
});
I catch this buffer in a controller using:
User.showImage( params , function (err, buffer){
// this is where I can get the buffer
});
Is it possible to stream this data as an image file (using buffer.pipe(res) doesn't work of course). But is there something similar to completely avoid saving file to server disk first?
The other option client.downloadFile(params) requires a local path (i.e. a server path in our case)
The GitHub issue contains the "official" answer to this question: https://github.com/andrewrk/node-s3-client/issues/53

Resources