I'm attempting to write a file upload service using the serverless framework that can accept binary input and store the data in S3.
The problem is that files end up corrupted in the S3 bucket. Text files do come through but my test image does not.
This is my code so far:
const serverless = require('serverless-http');
const express = require('express');
const crypto = require("crypto")
const AWS = require('aws-sdk');
const app = express();
const s3 = new AWS.S3();
app.use(function(req, res, next) {
var chunks = [];
req.on('data', function(chunk) { chunks.push(chunk); });
req.on('end', function() {
req.rawBody = Buffer.concat(chunks);
next();
});
});
app.put('/v1/upload', async (req, res, cb) => {
let hash = crypto.createHash("sha256").update(req.rawBody).digest("hex");
console.log(req.rawBody.length);
... s3 stuff here
I can see in the console that the file size is wrong; 2540872. The real size is 1395559.
I'm using curl to test the upload
curl -v -X PUT -H "Content-Type: application/octet-stream" --data-binary #test/image.png http://localhost:3000/prod/v1/upload
The easiest way to do this is to actually take the Lambda function out of the loop entirely for managing the upload process. S3 has a feature that allows you to generate a temporary set of credentials to allow a user to upload a file directly to the S3 bucket. This has a lot of benefits, the biggest being reduced cost of Lambda streaming binary data and billing per 100ms, as well as far more reliable uploads just in support of file sizes, and error management. What this means is that instead of the user uploading with a form via API Gateway to Lambda, the process from a frontend looks a lot more like:
Make HTTP request to API Gateway endpoint which triggers a Lambda function. The Lambda function then uses the AWS SDK to create a pre-signed URL and policy with valid permissions (https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#createPresignedPost-property). These details are then returned the the frontend client
The frontend constructs a form following the requirements of that pre-signed URL along with a file upload entity. The binary will then be directly uploaded to the S3 bucket.
At this point the frontend can then call another Lambda function (via API Gateway) if need be to kick off processing of the new image and associate it with the user or whatever else needs to happen at that point. You can even asynchronously process the image by attaching an S3 put event to a Lambda function directly (https://www.serverless.com/framework/docs/providers/aws/events/s3/#s3)
Hope that helps.
Related
So I want to pipe a file straight to the client; how I am currently doing it is create a file to disk, then sending that file straight to the client.
router.get("/download/:name", async (req, res) => {
const s3 = new aws.S3();
const dir = "uploads/" + req.params.name + ".apkg"
let file = fs.createWriteStream(dir);
await s3.getObject({
Bucket: <bucket-name>,
Key: req.params.name + ".apkg"
}).createReadStream().pipe(file);
await res.download(dir);
});
I just looked up that res.download() only serves locally. Is there a way you can do it directly from AWS S3 to Client download? i.e. pipe files straight to user. Thanks in advance
As described in this SO thread:
You can simply pipe the read stream into the response instead of the piping it to the file, just make sure to supply the correct Content-Type and to set it as an attachment, so the browser will know how to handle the response properly.
res.attachment(req.params.name);
await s3.getObject({
Bucket: <bucket-name>,
Key: req.params.name + ".apkg"
}).createReadStream().pipe(res);
On more pattern for this is to create a signed url directly to the S3 object and then let the client download straight from S3, instead of streaming it from your node webserver. This will reduce the workload from your web server.
You will need to use the getSignedUrl method from the AWS S3 SDK for JS.
Then, Once you have the URL, just return it to your client to download the file by themselves.
You should take into account that once you give the client a signed URL that has download permissions for, say, 5 minutes, they will only be able to download that file during those next 5 minutes. And you should also take into account that they will be able to pass that URL to anyone else for download during those 5 minutes, so it is dependant on how secure you need this to be.
S3 can be used to content so I would do the following.
Add CORS headers on your node response. This will enable browser to download from another origin i.e. S3.
Enable S3 web server on your bucket.
Script to download redirect from S3 - this you could achieve in JS.
Use signed URL as suggested in the other post if you need to protect S3 content.
I am trying to send a multipart/form-data form to an AWS-Lambda method.
I need to be able to send files to S3, and using incoming string parameters, I need to record metadata to RDS.
Now, I can do that using express and multer-s3 as follows;
var express = require('express');
var AWS = require('aws-sdk');
var multer = require('multer')
var multerS3 = require('multer-s3')
var s3 = new AWS.S3();
const app = express();
var upload = multer({
storage: multerS3({
s3: s3,
bucket: 'my-bucket-name',
metadata: function (req, file, cb) {
cb(null, Object.assign({}, req.body));
},
key: function (req, file, cb) {
cb(null, Date.now().toString() + '.fileExtension')
}
})
})
app.post('/data', upload.array('file'), function(req, res, next) {
// here using req.files, i can save metadata to RDS
})
My question is, is it possible to use multer-s3 in an AWS Lambda method? If the answer is no, or it's not recommended, could you please point me in the right direction?
Thanks..
I know it's been a while since the question was posted, but for the sake of people that might end up here in the future:
Short answer: It's not recommended. Why? There's some weird handling of the files sent as part of the Form Data, not sure if it's either by API Gateway or S3. I spent a whole day trying to upload images from a SPA Angular application using a similar approach as the one you mention, but I just couldn't make it work: I was able to access the files in the request, previously parsed by Multer, and effectively put each of those to an S3 Bucket, but images got corrupted. Not sure if it would work for other types of files, but this approach required more work and it felt a bit hacky.
The best and easiest way of uploading files to an S3 bucket from outside of your AWS account (i.e., not using any AWS Service or EC2 instance) is, IMO, using Presigned URLs.
You can check this article that might point you in the right direction.
That said, you can configure API Gateway to allow your Lambda to receive binary files. If you're using Serverless, the following is a plugin that makes things easier for that matter: https://github.com/maciejtreder/serverless-apigw-binary
You need to configure binary support for your Gateway API (API -> Settings)
In the "Binary Media Types" add allowed mime types
In my app, i'm sending photos directly from the client to s3, using something similar to this suggested heroku recommendation: https://devcenter.heroku.com/articles/s3-upload-node
The main benefit is that it saves server cost (i'm assuming since chunks aren't being sent to the server using something such as multipart-y form data).
However, I wish to be able to share these images to twitter also, which states this requirement:
Ensure the POST is a multipart/form-data request. Either upload the raw binary (media parameter) of the file, or its base64-encoded contents (media_data parameter). Use raw binary when possible, because base64 encoding results in larger file sizes
I've tried sending the base64 needed for the client-side s3 upload back to the server, but depending on the photo size -- I often get an error that it's too big to send back.
TLDR
Do I need to send my photos using mulitparty / multipart form data to my server, so I can have the needed base64 / binary to share a photo to twitter, or can I keep sending photos from my client to s3?
Then, somehow, efficiently obtain the needed base64 / binary on the server (possibly using the request module), so I can then send the image to twitter?
One fairly easy way to do this without changing your client code a whole lot would be to use S3 events. S3 events can trigger a lambda function in AWS that can post the image to twitter. You can use any library inside the lambda function to do efficient posting to twitter. Not sure if you want to use Lambda or stick to Heroku.
If you are directly uploading documents from the client to upload to s3, you are exposing your AWS secret/private keys with the client. A more secure way would be uploading the images to node and node in turn upload it to S3. A recommended way to upload images to node server would be using
multipart/form-data and using Multer middleware.
Regardless of the upload method, you can use the following code to serve images to twitter. This code uses AWS-SDK module.
var s3 = new AWS.S3();
var filename = req.query.filename;
var params = {
Bucket: <bucketname>,
Key: <image path>
};
var extension = filename.split('.')[1];
if (extension == "jpg" || extension == "JPG" || extension == "jpeg" || extension == "JPEG")
{
res.setHeader('Content-Type', 'image');
}
else if (extension == "png" || extension == "PNG")
{
res.setHeader('Content-Type', 'image/png');
}
s3.getObject(params).createReadStream().pipe(res);
This method can scale with easy like any other express app.
I have my RESTapi server on which I store AWS public/secret keys. I also store client public/secret key (client is a user I created - it has permission to make CORS requests).
I have my external server which will upload files directly to S3 bucket. But I dont want to store AWS credentials on it - I want it before upload to somehow call main server to sign request and then upload file directly to s3.
For now I am using aws-sdk on external server like this:
var aws = require('aws-sdk');
aws.config.update({
"accessKeyId": process.env.AMAZONS3_ACCESSKEY_ID,
"secretAccessKey": process.env.AMAZONS3_ACCESSKEY,
"region": process.env.AMAZONS3_REGION_CLOUD,
});
var s3 = new aws.S3({ params: { Bucket: 'myCustomBucket' } });
s3.putObject(...);
Now I need to change so external server will call main server with some S3 params and it will get back signed key or something like that and it will use it to upload file...
So how endpoint on main server should look like (what params in should consumes and how to generate the sign)?
And then how I can make request from external server using the sign?
Have a look here http://docs.aws.amazon.com/aws-sdk-php/guide/latest/service-s3.html under the section create presigned url
// Get a command object from the client and pass in any options
// available in the GetObject command (e.g. ResponseContentDisposition)
$command = $client->getCommand('GetObject', array(
'Bucket' => $bucket,
'Key' => 'data.txt',
'ResponseContentDisposition' => 'attachment; filename="data.txt"'
));
// Create a signed URL from the command object that will last for
// 10 minutes from the current time
$signedUrl = $command->createPresignedUrl('+10 minutes');
echo file_get_contents($signedUrl);
// > Hello!
Create the command (in your case a put not a get) on one server, pass this to the main server which will create the presigned url. Pass this back to the external server to execute.
I'm quite new to node.js and would like to do the following:
user can upload one file
upload should be saved to amazon s3
file information should be saved to a database
script shouldn't be limited to specific file size
As I've never used S3 or done uploads before I might have some
wrong ideas - please correct me, if I'm wrong.
So in my opinion the original file name should be saved into the db and returned for download but the file on S3 should be renamed to my database entry id to prevent overwriting files. Next, should the files be streamed or something? I've never done this but it just seems not to be smart to cache files on the server to then push them to S3, does it?
Thanks for your help!
At first I recommend to look at knox module for NodeJS. It is from quite reliable source. https://github.com/LearnBoost/knox
I write a code below for Express module, but if you do not use it or use another framework, you should still understand basics. Take a look at CAPS_CAPTIONS in the code, you want to change them according to your needs / configuration. Please also read comments to understand pieces of code.
app.post('/YOUR_REQUEST_PATH', function(req, res, next){
var fs = require("fs")
var knox = require("knox")
var s3 = knox.createClient({
key: 'YOUR PUBLIC KEY HERE' // take it from AWS S3 configuration
, secret: 'YOUR SECRET KEY HERE' // take it from AWS S3 configuration
, bucket: 'YOUR BUCKET' // create a bucket on AWS S3 and put the name here. Configure it to your needs beforehand. Allow to upload (in AWS management console) and possibly view/download. This can be made via bucket policies.
})
fs.readFile(req.files.NAME_OF_FILE_FIELD.path, function(err, buf){ // read file submitted from the form on the fly
var s3req = s3.put("/ABSOLUTE/FOLDER/ON/BUCKET/FILE_NAME.EXTENSION", { // configure putting a file. Write an algorithm to name your file
'Content-Length': buf.length
, 'Content-Type': 'FILE_MIME_TYPE'
})
s3req.on('response', function(s3res){ // write code for response
if (200 == s3res.statusCode) {
// play with database here, use s3req and s3res variables here
} else {
// handle errors here
}
})
s3req.end(buf) // execute uploading
})
})