Identify external AWS S3 Buckets - python-3.x

Looking for some help writing some code that will pull down all bucket names and identify which ones are externally visible (open to the internet for read or write). I've read over the documentation for boto3 s3 and cannot find any commands that will allow me to make this query... should I be looking under IAM?
So far, I am only able to print the bucket names... I would like to report name + its internet presence. The goal is to identify which s3 buckets are visible from the internet so we can periodically review the data/objects within them.
print("############# S3 Bucket Dump ############")
s3 = boto3.resource('s3')
f = open('s3buckets.txt', 'w')
for bucket in s3.buckets.all():
print(bucket.name, file=f)

Related

What is the required permission to get s3 bucket creation date using boto3?

I'm trying to check if a bucket exists on s3 and have been following this link: https://stackoverflow.com/a/49817544/19505278
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
if bucket.creation_date:
print("The bucket exists")
else:
print("The bucket does not exist")
However, I'm unable to get this to work due to a potential missing permission.
I was able to try this on a different s3 bucket and can verify this works. However, the s3 bucket I'm working with does not and is likely due to missing permissions. Unfortunately, I do not have access to the working bucket's permissions.
Is there a permission that I need to enable to retrieve bucket metadata?
Here is how you would typically test for the existence of an S3 bucket:
import boto3
from botocore.exceptions import ClientError
Bucket = "my-bucket"
s3 = boto3.client("s3")
try:
response = s3.head_bucket(Bucket=Bucket)
print("The bucket exists")
except ClientError as e:
if e.response["Error"]["Code"] == "404":
print("No such bucket")
elif e.response["Error"]["Code"] == "403":
print("Access denied")
else:
print("Unexpected error:", e)
If you think that there is a permission issue, you might want to check the documentation on permissions on s3. If you simply want to make sure you can check existence of all buckets, s3:ListAllMyBuckets would work nicely.
For the code, you usually want to make it light-weight by using head_bucket for buckets, head_object for objects etc. #jarmod above provided sample code.
As for question on client vs resource, client is close to metal i.e. actual back-end api powering the service. Resource is higher level. It tries to create meaningful objects that you would create from client response. They both use botocore underneath. There are sometimes slight differences when requesting something as resource would already have the knowledge of underlying object.
For example, if you first create a Bucket Resource object, you can simply use a method that's meaningful for that bucket without specifying Bucket Name again.
resource = boto3.resource('s3')
bucket = resource.Bucket('some_bucket_name')
# you can do stuff with this bucket, e.g. create it without supplying any params
bucket.create()
# if you are using client, story is different. You dont have access to objects, so you need to supply everything
client = boto3.client('s3')
client.create_bucket(BucketName='some_bucket_name')
# here you would need to supply
client.create_bucket()

I am not able to read dat file from S3 bucket using lambda function

I have been trying to read dat file from one s3 bucket and convert it into CSV and then compress it and put it into another bucket
for open and reading i am using below code but it is throwing me an error No such file or directory
with open(f's3://{my_bucket}/{filenames}', 'rb') as dat_file:
print(dat_file)'''
The Python language does not natively know how to access Amazon S3.
Instead, you can use the boto3 AWS SDK for Python. See: S3 — Boto 3 documentation
You also have two choices about how to access the content of the file:
Download the file to your local disk using download_file(), then use open() to access the local file, or
Use get_object() to obtain a StreamingBody of the file contents
See also: Amazon S3 Examples — Boto 3 documentation

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

How to upload downloaded file to s3 bucket using Lambda function

I saw different questions/answers but I could not find the one that worked for me. Hence, I am really new to AWS, I need your help. I am trying to download gzip file and load it to the json file then upload it to the S3 bucket using Lambda function. I wrote the code to download the file and convert it to json but having problem while uploading it to the s3 bucket. Assume that file is ready as x.json. What should I do then?
I know it is really basic question but still help needed :)
This code will upload to Amazon S3:
import boto3
s3_client = boto3.client('s3', region_name='us-west-2') # Change as appropriate
s3._client.upload_file('/tmp/foo.json', 'my-bucket', 'folder/foo.json')
Some tips:
In Lambda functions you can only write to /tmp/
There is a limit of 512MB
At the end of your function, delete the files (zip, json, etc) because the container can be reused and you don't want to run out of disk space
If your lambda has proper permission to write a file into S3, then simply use boto3 package which is an AWS SDK for python.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Be aware that if the lambda locates inside of VPC then lambda cannot access to the public internet, and also boto3 API endpoint. Thus, you may require a NAT gateway to proxy lambda to the public.

Amazon S3 and Cloudfront - Publish file uploaded as hashed filename

Technologies:
Python3
Boto3
AWS
I have a project built using Python3 and Boto3 to communicate with a bucket in Amazon S3 service.
The process is that a user posts images to the service; these' images are uploaded to an S3 bucket, and can be served through amazon cloudfront using a hashed file name instead of the real file name.
Example:
(S3) Upload key: /category-folder/png/image.png
(CloudFront) Serve: http://d2949o5mkkp72v.cloudfront.net/d824USNsdkmx824
I want to file uploaded to S3, appear as hash number as file name in cloudfront server.
Does anyone have knowledge that makes S3 or cloudfront automatically convert and publish a file-name to a hash name.
In order to suffice my needs I created the fields needed to maintain the keys (to make them unique; both on S3 and in my mongodb)
Fields:
original_file_name = my_file_name
file_category = my_images, children, fun
file_type = image, video, application
key = uniqueID
With the mentioned fields; then one can check if the key exists by simply searching for the key, the new file_name, the category, and the type; if it exists in the database then file exists.
To generate the unique id:
def get_key(self):
from uuid import uuid1
return uuid1().hex[:20]
This limits the ID to the length of 20 characters.

Resources