Trigger S3 Object without upload - node.js

I have a feeling the answer to my question will be a correct google term that i am missing but here we go.
I need to trigger all objects in an s3 bucket without uploading. The reason being i have a lambda that gets triggered on PutObject and i want to reprocess all those files again. There are huge images and re-uploading does not sound like a good idea.
I am trying to do this in nodejs but any language that anyone is comfortable with will help and i will translate.
Thanks

Amazon S3 Event can trigger an AWS Lambda function when an object is created/deleted/replicated.
However, it is not possible to "trigger the object" -- the object would need to be created/deleted/replicated to cause the Amazon S3 Event to be generated.
As an alternative, you could create a small program that lists the objects in the bucket, and then directly invokes the AWS Lambda function, passing the object details in the event message to make it look like it came from Amazon S3. There is a sample S3 Event in the Lambda 'test' function -- you could copy this template and have your program insert the appropriate bucket and object key. Your Lambda function would then process it exactly as if an S3 Event had triggered the function.

In addition to what explained above, you can use AWS S3 Batch Operations.
We used this to encrypt existing objects in the S3 bucket which were not encrypted earlier.
This was the easiest out of the box solution available in the S3 console itself.

You could also loop through all objects in the bucket and add a tag. Next, adjust your trigger event to include tag changes. Code sample in bash to follow after I test it.

Related

Issues using Lambda and BOTO3 to copies files between buckets [duplicate]

I am currently exploring storing the attachments of an email separately from the .eml file itself. I have an SES rule set that delivers an inbound email to a bucket. When the bucket retrieves the email, an S3 Put Lambda function parses the raw email (MIME format), base64 decodes the attachment buffers, and does a putObject for each attachment and the original .eml file to a new bucket.
My problem is that this Lambda function does not trigger for emails with attachments exceeding ~3-4 MB. The email is received and stored in the initial bucket, but the function does not trigger when it is received. Also, the event does not appear in CloudWatch. However, the function works perfectly fine when manually testing it with a hardcoded S3 Put payload, and also when manually uploading a .eml file to the assigned bucket.
Do you have any idea why there is this limitation? Perhaps this is a permission issue with the bucket or maybe an issue with the assigned Lambda role? When manually testing I’ve found this is by no means a timeout or exceeding max memory used issue.
The larger files are almost certainly being uploaded via S3 Multipart Upload instead of a regular Put operation. You need to configure your Lambda subscription to also be notified of Multipart uploads. It sounds like the function is only subscribed to s3:ObjectCreated:Put events currently, and you need to add s3:ObjectCreated:CompleteMultipartUpload to the configuration.
I faced the same issue.If the Etag of the file you uploaded to S3 ends with a hyphen followed by a number then it denotes the file was uploaded using Multipart. Subscribing to CompleteMultipartUpload Event resolved the issue.
I was getting same issue. Despite having s3:ObjectCreated:CompleteMultipartUpload as event notification, the trigger failed.
I later realized that the issue was with the lambda's timeout period. This could also be a potential issue.
As per AWS Docs to listen to all object created events you can listen to s3:ObjectCreated:*

Need to upload directory content to S3 bucket

My scenario is I am currently using AWS CLI to upload my directory content to S3 bucket using following AWS CLI command:
aws s3 sync results/foo s3://bucket/
Now I need to replace this and have python code to do this. I am exploring boto3 documentation to find the right way to do it. I see some options such as:
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/s3.html#S3.Client.upload_file
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/s3.html#S3.ServiceResource.Object
Could someone suggest which is the right approach.
I am aware that I would have to get the credentials by calling boto3.client('sts').assume_role(role, session) and use them subsequently.
The AWS CLI is actually written in Python and uses the same API calls you can use.
The important thing to realize is that Amazon S3 only has an API call to upload/download one object at a time.
Therefore, your Python code would need to:
Obtain a list of files to copy
Loop through each file and upload it to Amazon S3
Of course, if you want sync functionality (which only copies new/modified files), then your program will need more intelligence to figure out which files to copy.
Boto3 has two general types of methods:
client methods that map 1:1 with API calls, and
resource methods that are more Pythonic but might make multiple API calls in the background
Which type you use is your own choice. Personally, I find the client methods easier for uploading/downloading objects, and the resource methods are good when having to loop through resources (eg "for each EC2 instance, for each EBS volume, check each tag").

AWS Lambda Function - Image Upload - Process Review

I'm trying to better understand how the overall flow should work with AWS Lambda and my Web App.
I would like to have the client upload a file to a public bucket (completely bypassing my API resources), with the client UI putting it into a folder for their account based on a GUID. From there, I've got lambda to run when it detects a change to the public bucket, then resizing the file and placing it into the processed bucket.
However, I need to update a row in my RDS Database.
Issue
I'm struggling to understand the best practice to use for identifying the row to update. Should I be uploading another file with the necessary details (where every image upload consists really of two files - an image and a json config)? Should the image be processed, and then the client receives some data and it makes an API request to update the row in the database? What is the right flow for this step?
Thanks.
You should use a pre-signed URL for the upload. This allows your application to put restrictions on the upload, such as file type, directory and size. It means that, when the file is uploaded, you already know who did the upload. It also prevents people from uploading randomly to the bucket, since it does not need to be public.
The upload can then use an Amazon S3 Event to trigger the Lambda function. The filename/location can be used to identify the user, so the database can be updated at the time that the file is processed.
See: Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
I'd avoid uploading a file directly to S3 bypassing the API. Uploading file from your API allows you to control type of file, size etc as well as you will know who exactly is uploading the file (API authid or user id in API body). This is also a security risk to open a bucket to public for writes.
Your API clients can then upload the file via API, which then can store file on S3 (trigger another lambda for processing) and then update your RDS with appropriate meta-data for that user.

aws s3 putObject vs sync

i need to upload a large file to aws s3 bucket. in every 10 minute my code delete old file from source directory and generate a new file. File size is around 500 MB. Now i used s3.putObject() method for uploading each file after creation. i also heard about aws s3 sync. its coming with aws-cli. it used for uploading files to s3 bucket.
i used aws-sdk for node.js for s3 upload. aws-sdk for node.js does not contain s3-sync method. is s3-sync is better than s3.putObject() method?. i need faster upload.
There's always more than way to make on thing, so to upload a file into a S3 bucket you can :
use aws CLI and run aws s3 cp ...
use aws CLI and run aws s3api put-object ...
use aws SDK (your language of choice)
you can also use sync method but for a single file, there's no need to sync a whole directory, and generally when looking for better performance its better to start multiple cp instances to benefit from multi thread vs sync mono-thread.
basically all this methods are wrapper for the aws S3 API calls. From amazon doc
Making REST API calls directly from your code can be cumbersome. It requires you to write the necessary code to calculate a valid signature to authenticate your requests. We recommend the following alternatives instead:
Use the AWS SDKs to send your requests (see Sample Code and Libraries). With this option, you don't need to write code to calculate a signature for request authentication because the SDK clients authenticate your requests by using access keys that you provide. Unless you have a good reason not to, you should always use the AWS SDKs.
Use the AWS CLI to make Amazon S3 API calls. For information about setting up the AWS CLI and example Amazon S3 commands see the following topics:
Set Up the AWS CLI in the Amazon Simple Storage Service Developer Guide.
Using Amazon S3 with the AWS Command Line Interface in the AWS Command Line Interface User Guide.
so Amazon would recommend to use the SDK. At the end of the day, I think its really a matter to what you're most comfortable and how you will integrate this piece of code into the rest of your program. For one-time action, I always go to CLI.
In term of performance though, using one or the other will not make difference as again they're just wrapper to AWS API call. For transfer optimization, you should look at aws s3 transfer acceleration and see if you can enable it

Is there a good way to upload a full size image and then display smaller versions on the fly?

I'm trying to find a good way to distribute image on my webapp.
The ideal would be that the user upload a "big" full size of an image on S3, and when displayed on the website in different context, smaller versions of this images are displayed.
Of course they need to be cached/stored somewhere otherwise the server would quickly be exhausted...
Is there a good strategy to implement this in Node/Express ?
Thanks for your input !
To automatically resize a picture when it is uploaded to Amazon S3, consider using AWS Lambda.
A Lambda function can be associated with an S3 bucket. When an object is uploaded/updated/deleted, the Lambda function can be triggered. The function can then do pretty-much anything, such as resizing a picture.
In fact, there's an example that does that in the documentation: AWS Lambda Walkthrough 2: Handling Amazon S3 Events
AWS Lambda already has ImageMagick installed, which can make the resizing very simple. Lambda can be coded in Node and Java, with other languages to be released.

Resources