Amazon S3 and Cloudfront - Publish file uploaded as hashed filename - python-3.x

Technologies:
Python3
Boto3
AWS
I have a project built using Python3 and Boto3 to communicate with a bucket in Amazon S3 service.
The process is that a user posts images to the service; these' images are uploaded to an S3 bucket, and can be served through amazon cloudfront using a hashed file name instead of the real file name.
Example:
(S3) Upload key: /category-folder/png/image.png
(CloudFront) Serve: http://d2949o5mkkp72v.cloudfront.net/d824USNsdkmx824
I want to file uploaded to S3, appear as hash number as file name in cloudfront server.
Does anyone have knowledge that makes S3 or cloudfront automatically convert and publish a file-name to a hash name.

In order to suffice my needs I created the fields needed to maintain the keys (to make them unique; both on S3 and in my mongodb)
Fields:
original_file_name = my_file_name
file_category = my_images, children, fun
file_type = image, video, application
key = uniqueID
With the mentioned fields; then one can check if the key exists by simply searching for the key, the new file_name, the category, and the type; if it exists in the database then file exists.
To generate the unique id:
def get_key(self):
from uuid import uuid1
return uuid1().hex[:20]
This limits the ID to the length of 20 characters.

Related

Retrieve S3 URL of a file uploaded using Paperclip Rails from Node.js

I have an image uploaded to S3 bucket using paperclip gem from a rails app.
Is there any way to retrieve the URL of that image from Node.js since both the app are using the shared DB?
EDIT
To clarify further, file-name of the images uploaded to S3 are stored in the file_name column of the table. In rails app, the instance of table-model can return the exact URL using the S3 configs specified in paperclip.rb.
For e.g., https://s3-region.amazonaws.com/bucket-name/table/column/000/000/345/thumb/file-name.webp?1655104806
where 345 is the PK of the table

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

Ignore Default AWS KMS Encryption for S3 Uploads Using Python Boto3

We recently enabled AWS KMS for all of our Amazon S3 buckets which, by default, applies server-side encryption to all files we upload to our own S3 buckets or to S3 buckets owned by someone else.
Is there a way to intentionally "ignore" the default KMS encryption to upload unencrypted files to an S3 bucket owned by a 3rd party? The 3rd party team cannot open any of the files we are sending them. I understand that one solution would be to share the KMS key with the 3rd party but, due to the nature of the relationship, it's better if we only deliver unencrypted files instead of sharing a key.
Here is the Python code I have been using to deliver the files. How can I modify the ExtraArgs parameter to intentionally ignore the default KMS encryption?
from boto3 import client
from boto3.s3.transfer import TransferConfig
client = client('s3', ...)
config = TransferConfig(multipart_threshold=1024 * 25, multipart_chunksize=1024 * 25,
max_concurrency=10, use_threads=True)
client.upload_file(filename='test.csv', bucket='my-bucket', key='test.csv',
Config=config, ExtraArgs={'ACL': 'bucket-owner-full-control'})

Identify external AWS S3 Buckets

Looking for some help writing some code that will pull down all bucket names and identify which ones are externally visible (open to the internet for read or write). I've read over the documentation for boto3 s3 and cannot find any commands that will allow me to make this query... should I be looking under IAM?
So far, I am only able to print the bucket names... I would like to report name + its internet presence. The goal is to identify which s3 buckets are visible from the internet so we can periodically review the data/objects within them.
print("############# S3 Bucket Dump ############")
s3 = boto3.resource('s3')
f = open('s3buckets.txt', 'w')
for bucket in s3.buckets.all():
print(bucket.name, file=f)

PermanentRedirect while generating pre signed url

I am having an issue while creating a pre signed url from aws s3 using aws-sdk in nodejs. It gives me PermanentRedirect The bucket you are attempting to access must be addressed using the specified endpoint.
const s3 = new AWS.S3()
AWS.config.update({accessKeyId: 'test123', secretAccessKey: 'test123'})
AWS.config.update({region: 'us-east-1'})
const myBucket = 'test-bucket'
const myKey = 'test.jpg'
const signedUrlExpireSeconds = 60 * 60
const url = s3.getSignedUrl('getObject', {
Bucket: myBucket,
Key: myKey,
Expires: signedUrlExpireSeconds
})
console.log(url)
How I can remove this error to get pre signed url working. Also I need to know what is a purpose of Key.
1st - what is your region of the bucket? S3 is global service yet each bucket has region, while creating the bucket you must select it.
2nd - when working with S3 not in N.Virginia region there could be situations when internal aws SSL/DNS is not in sync yet. I had this issue multiple times, can't find exact docs related to this but issue is from nature of redirects, not found or no access. Then after 4-12h it starts to just work. What i happen to dig out about these issues is something related to internal AWS SSL/DNS related to S3 buckets that are not in n.virginia region. So could be it.
3rd - If you re-created buckets multiple times and re-using same name. Bucket name is global, even if bucket is regional. So could be again related to 2nd scenarios when previously within last 24h bucket was actually on different region and now AWS's internal DNS/SSL haven't synced yet.
p.s. Key is object's key, any object inside bucket has key. On aws console you can navigate "key" which looks like path to file, but it's not a path to file. S3 has no concept of directories like hard drives. Any path to file is a key of object. AWS console just splits object's key by / and displays as directories to have better UX while navigating the UI.

Resources