Streaming uploading to S3 using presigned URL - node.js

I'm trying to upload a file into a customer's S3. I'm given a presigned URL that allows me to do a PUT request. I have no access to their access and secret key so the use of the AWS SDK is out of the question.
The use case is that I am consuming a gRPC server streaming call and transforming it into a csv with some field changes. As the calls come in, I would want to be able to stream the transformed gRPC response into S3. I would need to do it via streaming cause the response can get rather large, upwards of >100mb, so loading everything into memory before uploading it into S3 is not ideal. Any ideas?

This is an open issue with pre-signed S3 upload URL:
https://github.com/aws/aws-sdk-js/issues/1603
Currently, the only working solution for large upload through S3 pre-signed URL is to use multi-part upload. The biggest drawback of that approach is you need to let the server that signs the upload know the size of your file, as it will need to pre-sign each part individually. e.g: 100MB file upload will require 20 parts (each 5MB maximum) to be pre-signed individually.

Related

How to upload >5GB file using multipart API of S3 right from the browser?

I have tried the PUT request by XMLHttpRequest. There is a browser-side limitation that doesn't allow me to upload files larger than 2GB. Then I have tried the POST request from an HTML form that doesn't require Javascript side preprocessing. It has 5GB upload size limitation in a single operation.
AWS recommended multipart upload in larger upload scenarios. That requires files to chunk down then upload into pieces. How to do it right from the browser, when the file size is greater than 10GB.
You can use chucks in combination with signed URLs. See this link for more details https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload.

AWS Lambda Function - Image Upload - Process Review

I'm trying to better understand how the overall flow should work with AWS Lambda and my Web App.
I would like to have the client upload a file to a public bucket (completely bypassing my API resources), with the client UI putting it into a folder for their account based on a GUID. From there, I've got lambda to run when it detects a change to the public bucket, then resizing the file and placing it into the processed bucket.
However, I need to update a row in my RDS Database.
Issue
I'm struggling to understand the best practice to use for identifying the row to update. Should I be uploading another file with the necessary details (where every image upload consists really of two files - an image and a json config)? Should the image be processed, and then the client receives some data and it makes an API request to update the row in the database? What is the right flow for this step?
Thanks.
You should use a pre-signed URL for the upload. This allows your application to put restrictions on the upload, such as file type, directory and size. It means that, when the file is uploaded, you already know who did the upload. It also prevents people from uploading randomly to the bucket, since it does not need to be public.
The upload can then use an Amazon S3 Event to trigger the Lambda function. The filename/location can be used to identify the user, so the database can be updated at the time that the file is processed.
See: Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
I'd avoid uploading a file directly to S3 bypassing the API. Uploading file from your API allows you to control type of file, size etc as well as you will know who exactly is uploading the file (API authid or user id in API body). This is also a security risk to open a bucket to public for writes.
Your API clients can then upload the file via API, which then can store file on S3 (trigger another lambda for processing) and then update your RDS with appropriate meta-data for that user.

Is it necessary to convert an image from gallery to base64 before uploading to amazon s3 server?

A requirement for my mobile application is that I need to allow a user to select a image from their camera roll, and upload it to a s3 server.
In react native, i send a http request to an endpoint I created, in which a file is uploaded to amazon s3.
When sending the http request, i require to set a parameter to a file (The one a user selects). Do i need to change that file into a base64 format, and then send the http request, or can i just send the file as it is without any modifications?
You can store your file as is in S3 (binary).
Whether your choose to upload it as base64 or binary depends on your application requirements only. S3 will take both.

Rest API - Uploading large files to S3 using Pre-signed URLs and updating backend on success

Context
I am building Stateless REST APIs for a browser-based platform that needs to store some user-generated files. These files could potentially be in the GBs.
I am using AWS S3 for storage. I have used AWS SDK in the past for this to route the file uploads through the NodeJS server (Basically - Upload to Server, Server uploads to S3).
I am trying to figure out how to improve it using the Pre-signed urls. I understand the dynamics and the flow on how to get the presigned urls and how to upload the file to S3 directly.
I cannot use SQS or Lambda to trigger object created event.
The architecture needs to be AWS independent.
Question
The simplest of flows I need to achieve is pretty common -
User --> Opens Profile
Clicks Upload Photo
Client Sends Request to /getSignedUrl
Server Returns the signedURL for the file name/type
The client executes the PUT/POST request to upload the file to the signedUrl
Upload Successful
After this - my understanding is -
Client tells the server - File Uploaded Successfully
Server associates the S3 Url for the Photo to the User.
...and that's my problem. How do I associate the successfully uploaded file back to the user on the server in a secure way?
Not sure what I've been missing. It seems like a trivial use case but I haven't been able to find anything regarding it.
1/ I think for the avatar, you should set it as public-read.
When create signed upload url in the
GET: /signed-upload-url
You need to set the image as public-read. After that you are free to interact with image through the direct url. Since this is an avatar, so you can compress it, reduce the size of image by the AWS Lambda function.
2/ If you don't want to have the public-read, you need to associate with server to get signed-download-url to interact with image
GET: /signed-download-url

Transferring a very large image from the web to S3 using AWS Lambda

I have access to a 20 GB image file from the web that we'd like to save on S3.
Is it possible to do this with AWS Lambda? From how I understand the situation, the limitations seem to be the following:
The lambda memory (can't load the whole image into memory)
Now if we decide to stream from the web to S3 (say using requests.get(image_url, stream=True) or smart_open..
the lambda reaching its timeout limit, along with..
S3 not supporting appending to S3 objects. Thus, succeeding lambda runs to continue "assembling" the image on S3 (where preceding ones left off) will have to load the partial image that's already on S3 into memory, before it can start appending more data, and uploading the resulting larger partial image to S3.
I've also heard of others suggesting to use multi-part uploads. But I'd be happy to know how that's different from streaming, and how that will overcome the limitations listed above.
Thank you!
Things are much simplified with s3.
Create a lambda to generate pre-signed url for multipart upload.
Create Multipart Upload:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#createMultipartUpload-property
Create Signed URL with the above Multipart Upload Key:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property
Use that url to upload multiple parts of your file parallel.
You can also use S3 accelerator for high-speed upload.
Hope it helps.
EDIT1:
You can split the file in chunk between 1 to 10,000 and upload them parallelly.
http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
If you are doing only one file upload, you can generate the signedurl and multipart in the cli rather than lambda.
If you are doing regularly, you can generate them via lambda.
When you read the file to upload, if you read them via HTTP, read them in a chunk and upload in multipart.
If you are reading the file locally, you can have the starting point of the file for each chunk and upload them with multipart.
Hope it helps.

Resources