i need to upload a large file to aws s3 bucket. in every 10 minute my code delete old file from source directory and generate a new file. File size is around 500 MB. Now i used s3.putObject() method for uploading each file after creation. i also heard about aws s3 sync. its coming with aws-cli. it used for uploading files to s3 bucket.
i used aws-sdk for node.js for s3 upload. aws-sdk for node.js does not contain s3-sync method. is s3-sync is better than s3.putObject() method?. i need faster upload.
There's always more than way to make on thing, so to upload a file into a S3 bucket you can :
use aws CLI and run aws s3 cp ...
use aws CLI and run aws s3api put-object ...
use aws SDK (your language of choice)
you can also use sync method but for a single file, there's no need to sync a whole directory, and generally when looking for better performance its better to start multiple cp instances to benefit from multi thread vs sync mono-thread.
basically all this methods are wrapper for the aws S3 API calls. From amazon doc
Making REST API calls directly from your code can be cumbersome. It requires you to write the necessary code to calculate a valid signature to authenticate your requests. We recommend the following alternatives instead:
Use the AWS SDKs to send your requests (see Sample Code and Libraries). With this option, you don't need to write code to calculate a signature for request authentication because the SDK clients authenticate your requests by using access keys that you provide. Unless you have a good reason not to, you should always use the AWS SDKs.
Use the AWS CLI to make Amazon S3 API calls. For information about setting up the AWS CLI and example Amazon S3 commands see the following topics:
Set Up the AWS CLI in the Amazon Simple Storage Service Developer Guide.
Using Amazon S3 with the AWS Command Line Interface in the AWS Command Line Interface User Guide.
so Amazon would recommend to use the SDK. At the end of the day, I think its really a matter to what you're most comfortable and how you will integrate this piece of code into the rest of your program. For one-time action, I always go to CLI.
In term of performance though, using one or the other will not make difference as again they're just wrapper to AWS API call. For transfer optimization, you should look at aws s3 transfer acceleration and see if you can enable it
Related
What I need to accomplish
I need to fetch images from rest API (base64) and write individual images into Firebase Cloud Storage. After image is successfully written, write log into Firebase Realtime Database.
What's the problem
Well, I initialized Firebase app (Cloud Functions) with AdminSDK, because I need some admin features (for instance bypassing Realtime Database rules). According to Firebase documentation, if I use AdminSDK, then for manipulating Cloud Storage I must use "#google-cloud/storage".
So I looked up documentation for "#google-cloud/storage" and found out that for uploading files I have to call method ".upload" and specify path to file as argument.
The problem is I don't have path to that file because I have it as base64 string. I can't generate path for it, because nodeJS doesn't have method URL.createObjectURL and polyfiling that method is impossible. Also writing that image to filesystem is not solution, because Cloud Functions is read-only environment.
Things I tried
Polyfiling URL.createObjectURL, but it didn't work
Use nodeJS module FS to write images from REST API to filesystem, upload them to Cloud Storage and then remove them.
Is there any solution to this problem, or some recommended way to do this kind of functionality?
Solution
My second solution (writing temporary file using fs) was almost right, but I forgot to specify file destination to /tmp. In Firebase Cloud Functions is everything but /tmp read-only storage.
I have a feeling the answer to my question will be a correct google term that i am missing but here we go.
I need to trigger all objects in an s3 bucket without uploading. The reason being i have a lambda that gets triggered on PutObject and i want to reprocess all those files again. There are huge images and re-uploading does not sound like a good idea.
I am trying to do this in nodejs but any language that anyone is comfortable with will help and i will translate.
Thanks
Amazon S3 Event can trigger an AWS Lambda function when an object is created/deleted/replicated.
However, it is not possible to "trigger the object" -- the object would need to be created/deleted/replicated to cause the Amazon S3 Event to be generated.
As an alternative, you could create a small program that lists the objects in the bucket, and then directly invokes the AWS Lambda function, passing the object details in the event message to make it look like it came from Amazon S3. There is a sample S3 Event in the Lambda 'test' function -- you could copy this template and have your program insert the appropriate bucket and object key. Your Lambda function would then process it exactly as if an S3 Event had triggered the function.
In addition to what explained above, you can use AWS S3 Batch Operations.
We used this to encrypt existing objects in the S3 bucket which were not encrypted earlier.
This was the easiest out of the box solution available in the S3 console itself.
You could also loop through all objects in the bucket and add a tag. Next, adjust your trigger event to include tag changes. Code sample in bash to follow after I test it.
I have a Python Flask app deployed on heroku. I want to record user interactions in a file(kind of log file). Since heroku storage is temporary, even though I append actions to a log file the data is lost. I don't want use a DataBase for this simple task. My idea is to have an API that can modify files in a remote file system. I am looking for such remote file system(cloud storage) along with API to accomplish my task.
For example, let us assume that I have 3 buttons on my app and a tracking.txt file. Then
if button1 is clicked, I want to write(append) 1 to tracking.txt .
Similarly for button2 and button3.
I have searched the internet but didn't find any that can fit my exact need or I didn't understand any of them well.
Any help is appreciated. Thanks in advance.
PS: I am open to change my thought if there's no way other than using DB.
one possible solution is to use Amazon S3 together with the Boto3, the Amazon Web Services (AWS) SDK for Python.
You can copy (push) your file from Heroku to an S3 bucket (at intervals or after every change, this depends on your logic)
import boto3
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id='MY_AWS_ACCESS_KEY_ID',
aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
)
# upload file from local path to S3 Bucker
s3.upload_file(Bucket='data', Key='files/file1.log', Filename='/tmp/file1.log')
One option with this approach is that you can use localstack for your local development, hence only your (production-like) application on Heroku will send files to S3, while during development you can work offline
My scenario is I am currently using AWS CLI to upload my directory content to S3 bucket using following AWS CLI command:
aws s3 sync results/foo s3://bucket/
Now I need to replace this and have python code to do this. I am exploring boto3 documentation to find the right way to do it. I see some options such as:
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/s3.html#S3.Client.upload_file
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/s3.html#S3.ServiceResource.Object
Could someone suggest which is the right approach.
I am aware that I would have to get the credentials by calling boto3.client('sts').assume_role(role, session) and use them subsequently.
The AWS CLI is actually written in Python and uses the same API calls you can use.
The important thing to realize is that Amazon S3 only has an API call to upload/download one object at a time.
Therefore, your Python code would need to:
Obtain a list of files to copy
Loop through each file and upload it to Amazon S3
Of course, if you want sync functionality (which only copies new/modified files), then your program will need more intelligence to figure out which files to copy.
Boto3 has two general types of methods:
client methods that map 1:1 with API calls, and
resource methods that are more Pythonic but might make multiple API calls in the background
Which type you use is your own choice. Personally, I find the client methods easier for uploading/downloading objects, and the resource methods are good when having to loop through resources (eg "for each EC2 instance, for each EBS volume, check each tag").
I'm trying to better understand how the overall flow should work with AWS Lambda and my Web App.
I would like to have the client upload a file to a public bucket (completely bypassing my API resources), with the client UI putting it into a folder for their account based on a GUID. From there, I've got lambda to run when it detects a change to the public bucket, then resizing the file and placing it into the processed bucket.
However, I need to update a row in my RDS Database.
Issue
I'm struggling to understand the best practice to use for identifying the row to update. Should I be uploading another file with the necessary details (where every image upload consists really of two files - an image and a json config)? Should the image be processed, and then the client receives some data and it makes an API request to update the row in the database? What is the right flow for this step?
Thanks.
You should use a pre-signed URL for the upload. This allows your application to put restrictions on the upload, such as file type, directory and size. It means that, when the file is uploaded, you already know who did the upload. It also prevents people from uploading randomly to the bucket, since it does not need to be public.
The upload can then use an Amazon S3 Event to trigger the Lambda function. The filename/location can be used to identify the user, so the database can be updated at the time that the file is processed.
See: Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
I'd avoid uploading a file directly to S3 bypassing the API. Uploading file from your API allows you to control type of file, size etc as well as you will know who exactly is uploading the file (API authid or user id in API body). This is also a security risk to open a bucket to public for writes.
Your API clients can then upload the file via API, which then can store file on S3 (trigger another lambda for processing) and then update your RDS with appropriate meta-data for that user.