S3 file upload stream using node js - node.js

I am trying to find some solution to stream file on amazon S3 using node js server with requirements:
Don't store temp file on server or in memory. But up-to some limit not complete file, buffering can be used for uploading.
No restriction on uploaded file size.
Don't freeze server till complete file upload because in case of heavy file upload other request's waiting time will unexpectedly
increase.
I don't want to use direct file upload from browser because S3 credentials needs to share in that case. One more reason to upload file from node js server is that some authentication may also needs to apply before uploading file.
I tried to achieve this using node-multiparty. But it was not working as expecting. You can see my solution and issue at https://github.com/andrewrk/node-multiparty/issues/49. It works fine for small files but fails for file of size 15MB.
Any solution or alternative ?

You can now use streaming with the official Amazon SDK for nodejs in the section "Uploading a File to an Amazon S3 Bucket" or see their example on GitHub.
What's even more awesome, you finally can do so without knowing the file size in advance. Simply pass the stream as the Body:
var fs = require('fs');
var zlib = require('zlib');
var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body})
.on('httpUploadProgress', function(evt) { console.log(evt); })
.send(function(err, data) { console.log(err, data) });

For your information, the v3 SDK were published with a dedicated module to handle that use case : https://www.npmjs.com/package/#aws-sdk/lib-storage
Took me a while to find it.

Give https://www.npmjs.org/package/streaming-s3 a try.
I used it for uploading several big files in parallel (>500Mb), and it worked very well.
It very configurable and also allows you to track uploading statistics.
You not need to know total size of the object, and nothing is written on disk.

If it helps anyone I was able to stream from the client to s3 successfully (without memory or disk storage):
https://gist.github.com/mattlockyer/532291b6194f6d9ca40cb82564db9d2a
The server endpoint assumes req is a stream object, I sent a File object from the client which modern browsers can send as binary data and added file info set in the headers.
const fileUploadStream = (req, res) => {
//get "body" args from header
const { id, fn } = JSON.parse(req.get('body'));
const Key = id + '/' + fn; //upload to s3 folder "id" with filename === fn
const params = {
Key,
Bucket: bucketName, //set somewhere
Body: req, //req is a stream
};
s3.upload(params, (err, data) => {
if (err) {
res.send('Error Uploading Data: ' + JSON.stringify(err) + '\n' + JSON.stringify(err.stack));
} else {
res.send(Key);
}
});
};
Yes putting the file info in the headers breaks convention but if you look at the gist it's much cleaner than anything else I found using streaming libraries or multer, busboy etc...
+1 for pragmatism and thanks to #SalehenRahman for his help.

I'm using the s3-upload-stream module in a working project here.
There is also some good examples from #raynos in his http-framework repository.

Alternatively you can look at - https://github.com/minio/minio-js. It has minimal set of abstracted API's implementing most commonly used S3 calls.
Here is an example of streaming upload.
$ npm install minio
$ cat >> put-object.js << EOF
var Minio = require('minio')
var fs = require('fs')
// find out your s3 end point here:
// http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
var s3Client = new Minio({
url: 'https://<your-s3-endpoint>',
accessKey: 'YOUR-ACCESSKEYID',
secretKey: 'YOUR-SECRETACCESSKEY'
})
var outFile = fs.createWriteStream('your_localfile.zip');
var fileStat = Fs.stat(file, function(e, stat) {
if (e) {
return console.log(e)
}
s3Client.putObject('mybucket', 'hello/remote_file.zip', 'application/octet-stream', stat.size, fileStream, function(e) {
return console.log(e) // should be null
})
})
EOF
putObject() here is a fully managed single function call for file sizes over 5MB it automatically does multipart internally. You can resume a failed upload as well and it will start from where its left off by verifying previously upload parts.
Additionally this library is also isomorphic, can be used in browsers as well.

Related

How to display images of products stored on aws s3 bucket

I was practicing on this tutorial
https://www.youtube.com/watch?v=NZElg91l_ms&t=1234s
It is working absolutely like a charm for me but the thing is I am storing images of products I am storing them in bucket and lets say I upload 4 images they all are uploaded.
but when I am displaying them i got access denied error as I am displaying the list and repeated request are maybe detecting it as a spam
This is how i am trying to fetch them on my react app
//rest of data is from mysql datbase (product name,price)
//100+ products
{ products.map((row)=>{
<div className="product-hero"><img src=`http://localhost:3909/images/${row.imgurl}`</div>
<div className="text-center">{row.productName}</div>
})
}
as it fetch 100+ products from db and 100 images from aws it fails
Sorry for such detailed question but in short how can i fetch all product images from my bucket
Note I am aware that i can get only one image per call so how can I get all images one by one in my scenario
//download code in my app.js
const { uploadFile, getFileStream } = require('./s3')
const app = express()
app.get('/images/:key', (req, res) => {
console.log(req.params)
const key = req.params.key
const readStream = getFileStream(key)
readStream.pipe(res)
})
//s3 file
// uploads a file to s3
function uploadFile(file) {
const fileStream = fs.createReadStream(file.path)
const uploadParams = {
Bucket: bucketName,
Body: fileStream,
Key: file.filename
}
return s3.upload(uploadParams).promise()
}
exports.uploadFile = uploadFile
// downloads a file from s3
function getFileStream(fileKey) {
const downloadParams = {
Key: fileKey,
Bucket: bucketName
}
return s3.getObject(downloadParams).createReadStream()
}
exports.getFileStream = getFileStream
It appears that your code is sending image requests to your back-end, which retrieves the objects from Amazon S3 and then serves the images in response to the request.
A much better method would be to have the URLs in the HTML page point directly to the images stored in Amazon S3. This would be highly scalable and will reduce the load on your web server.
This would require the images to be public so that the user's web browser can retrieve the images. The easiest way to do this would be to add a Bucket Policy that grants GetObject access to all users.
Alternatively, if you do not wish to make the bucket public, you can instead generate Amazon S3 pre-signed URLs, which are time-limited URLs that provides temporary access to a private object. Your back-end can calculate the pre-signed URL with a couple of lines of code, and the user's web browser will then be able to retrieve private objects from S3 for display on the page.
I did sililar S3 image handling while I handle my blog's image upload functionality, but I did not use getFileStream() to upload my image.
Because nothing should be done until the image file is fully processed, I used fs.readFile(path, callback) instead to read the data.
My way will generate Buffer Data, but AWS S3 is smart enough to know to intercept this as image. (I have only added suffix in my filename, I don't know how to apply image headers...)
This is my part of code for reference:
fs.readFile(imgPath, (err, data) => {
if (err) { throw err }
// Once file is read, upload to AWS S3
const objectParams = {
Bucket: 'yuyuichiu-personal',
Key: req.file.filename,
Body: data
}
S3.putObject(objectParams, (err, data) => {
// store image link and read image with link
}
}

S3 fails to unzip uploaded file

I'm following this example
// Load the stream
var fs = require('fs'), zlib = require('zlib');
var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
// Upload the stream
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body}, function(err, data) {
if (err) console.log("An error occurred", err);
console.log("Uploaded the file at", data.Location);
})
And it "works" in that it does everything exactly as expected, EXCEPT that the file arrives on S3 compressed and stays that way.
As far as I can tell there's no auto facility for it to unzip it on S3, so, if your intention is to upload a publicly available image or video (or anything else that the end user is meant to simply consume) the solution appears to leave the uploaded file unzipped like so...
// Load the stream
var fs = require('fs'), zlib = require('zlib');
var body = fs.createReadStream('bigfile');//.pipe(zlib.createGzip()); <-- removing the zipping part
// Upload the stream
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body}, function(err, data) {
if (err) console.log("An error occurred", err);
console.log("Uploaded the file at", data.Location);
})
I'm curious if I'm doing something wrong and if there IS an automatic way to have S3 recognize that the file is arriving zipped and unzip it?
The way this works is that s3 has now way of knowing that the file is gziped without a bit of help. You need to set the metadata on the file when uploading telling S3 that it's gzipped. It will do the right thing is this is set.
you need to set Content-Encoding: gzip and Content-Type: <<your file type>> in the object metadata when uploading.
Later edit:
Found these which explains how to do it for Cloudfront, but basically the same: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html#CompressedS3
http://www.cameronstokes.com/2011/07/20/compressed-content-amazon-s3-and-cloudfront/
However note that as per this blogpost S3 will serve the file gzipped and rely on the browser to unzip it. This works fine in many cases but as the blogger notes will fail in curl (since curl will have no idea what to do with the gzipped file). So if your intention is to simply upload a file for raw consumption by the user your best bet is to skip the gzipping and upload the file in its uncompressed state.

How do I read and upload a large file to s3?

I'm using Node.js .10.22 and q-fs
I'm trying to upload objects to S3, which stopped working once the objects were over a certain MB size.
Besides taking up all the memory on my machine, it gives me this error
RangeError: length > kMaxLength
at new Buffer (buffer.js:194:21)
When I try to use fs.read on the file.
Normally, when this works, I do s3.upload, and put the buffer in the Body field.
How do I handle large objects?
You'll want to use a streaming version of the API to pipe your readable filesystem stream directly to the S3 upload http request body stream provided by the s3 module you are using. Here's an example straight from the aws-sdk documentation
var fs = require('fs');
var body = fs.createReadStream('bigfile');
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body}).
on('httpUploadProgress', function(evt) { console.log(evt); }).
send(function(err, data) { console.log(err, data) });

Node.js stream upload directly to Google Cloud Storage

I have a Node.js app running on a Google Compute VM instance that receives file uploads directly from POST requests (not via the browser) and streams the incoming data to Google Cloud Storage (GCS).
I'm using Restify b/c I don't need the extra functionality of Express and because it makes it easy to stream the incoming data.
I create a random filename for the file, take the incoming req and toss it to a neat little Node wrapper for GCS (found here: https://github.com/bsphere/node-gcs) which makes a PUT request to GCS. The documentation for GCS using PUT can be found here: https://developers.google.com/storage/docs/reference-methods#putobject ... it says Content-Length is not necessary if using chunked transfer encoding.
Good news: the file is being created inside the appropriate GCS storage "bucket"!
Bad News:
I haven't figured out how to get the incoming file's extension from Restify (notice I'm manually setting '.jpg' and the content-type manually).
The file is experiencing slight corruption (almost certainly do to something I'm doing wrong with the PUT request). If I download the POSTed file from Google, OSX tells me its damaged ... BUT, if I use PhotoShop, it opens and looks just fine.
Update / Solution
As pointed out by vkurchatkin, I needed to parse the request object instead of just piping the whole thing to GCS. After trying out the lighter busboy module, I decided it was just a lot easier to use multiparty. For dynamically setting the Content-Type, I simply used Mimer (https://github.com/heldr/mimer), referencing the file extension of the incoming file. It's important to note that since we're piping the part object, the part.headers must be cleared out. Otherwise, unintended info, specifically content-type, will be passed along and can/will conflict with the content-type we're trying to set explicitly.
Here's the applicable, modified code:
var restify = require('restify'),
server = restify.createServer(),
GAPI = require('node-gcs').gapitoken,
GCS = require('node-gcs'),
multiparty = require('multiparty'),
Mimer = require('mimer');
server.post('/upload', function(req, res) {
var form = new multiparty.Form();
form.on('part', function(part){
var fileType = '.' + part.filename.split('.').pop().toLowerCase();
var fileName = Math.random().toString(36).slice(2) + fileType;
// clear out the part's headers to prevent conflicting data being passed to GCS
part.headers = null;
var gapi = new GAPI({
iss: '-- your -- #developer.gserviceaccount.com',
scope: 'https://www.googleapis.com/auth/devstorage.full_control',
keyFile: './key.pem'
},
function(err) {
if (err) { console.log('google cloud authorization error: ' + err); }
var headers = {
'Content-Type': Mimer(fileType),
'Transfer-Encoding': 'Chunked',
'x-goog-acl': 'public-read'
};
var gcs = new GCS(gapi);
gcs.putStream(part, myBucket, '/' + fileName, headers, function(gerr, gres){
console.log('file should be there!');
});
});
});
};
You can't use the raw req stream since it yields whole request body, which is multipart. You need to parse the request with something like multiparty give you a readable steam and all metadata you need.

Advice: flatiron, formidable and aws s3

I'm new with serverside programming with node.js. I'm sticking together a tiny webapp with it right now and having the usual startup learning to do. The following piece of code WORKS. But I would love to know if it's more or less a right way to do a simple file upload from a form and throw it into aws s3:
app.router.post('/form', { stream: true }, function () {
var req = this.req,
res = this.res,
form = new formidable.IncomingForm();
form
.parse(req, function(err, fields, files) {
console.log('Parsed file upload' + err);
if (err) {
res.end('error: Upload failed: ' + err);
} else {
var img = fs.readFileSync(files.image.path);
var data = {
Bucket: 'le-bucket',
Key: files.image.name,
Body: img
};
s3.client.putObject(data, function() {
console.log("Successfully uploaded data to myBucket/myKey");
});
res.end('success: Uploaded file(s)');
}
});
});
Note: I had to turn buffer off in union / flatiron.plugins.http.
What I would like to learn is, when to stream load a file and when to syncload it. It will be a really tiny webapp with little traffic.
If it's more or less good then please consider this as a token of working code which I also would throw into a gist. It's not that easy to find documenation and working examples of this kind of stuff. I like flatiron alot. But it's small module approach leads to lots of splattered docs and examples all over the net, speak alone of tutorials.
You should use other module than formidable because as far as I know formidable does not have s3 storage option , then you must save the files in your server before uploading it.
I would recommend you to use : multiparty
Use this example in order to upload directly to S3 without saving it locally in your server.

Resources