how to delete S3 particular prefix life cycle from python3 - python-3.x

I am trying to delete the particular prefix life cycle by using python boto3
I have tried the below code, but below code is deleting the entire bucket life cycle configuration.
import boto3
client = boto3.client('s3')
response = client.delete_bucket_lifecycle(Bucket='my_bucket_name')
I want to delete the particular prefix life cycle.

The delete_bucket_policy() API call will delete a Bucket Policy, which is used to grant access to an Amazon S3 bucket.
It seems that you actually wish to delete a Lifecycle policy, which can be done with the delete_bucket_lifecycle() API call.

Related

What is the required permission to get s3 bucket creation date using boto3?

I'm trying to check if a bucket exists on s3 and have been following this link: https://stackoverflow.com/a/49817544/19505278
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
if bucket.creation_date:
print("The bucket exists")
else:
print("The bucket does not exist")
However, I'm unable to get this to work due to a potential missing permission.
I was able to try this on a different s3 bucket and can verify this works. However, the s3 bucket I'm working with does not and is likely due to missing permissions. Unfortunately, I do not have access to the working bucket's permissions.
Is there a permission that I need to enable to retrieve bucket metadata?
Here is how you would typically test for the existence of an S3 bucket:
import boto3
from botocore.exceptions import ClientError
Bucket = "my-bucket"
s3 = boto3.client("s3")
try:
response = s3.head_bucket(Bucket=Bucket)
print("The bucket exists")
except ClientError as e:
if e.response["Error"]["Code"] == "404":
print("No such bucket")
elif e.response["Error"]["Code"] == "403":
print("Access denied")
else:
print("Unexpected error:", e)
If you think that there is a permission issue, you might want to check the documentation on permissions on s3. If you simply want to make sure you can check existence of all buckets, s3:ListAllMyBuckets would work nicely.
For the code, you usually want to make it light-weight by using head_bucket for buckets, head_object for objects etc. #jarmod above provided sample code.
As for question on client vs resource, client is close to metal i.e. actual back-end api powering the service. Resource is higher level. It tries to create meaningful objects that you would create from client response. They both use botocore underneath. There are sometimes slight differences when requesting something as resource would already have the knowledge of underlying object.
For example, if you first create a Bucket Resource object, you can simply use a method that's meaningful for that bucket without specifying Bucket Name again.
resource = boto3.resource('s3')
bucket = resource.Bucket('some_bucket_name')
# you can do stuff with this bucket, e.g. create it without supplying any params
bucket.create()
# if you are using client, story is different. You dont have access to objects, so you need to supply everything
client = boto3.client('s3')
client.create_bucket(BucketName='some_bucket_name')
# here you would need to supply
client.create_bucket()

Give direct access of local files to Document AI

I know there is a way by which we can call Document AI from python environment in local system. In that process one needs to upload the local file to GCS bucket so that Document AI can access the file from there. Is there any way by which we can give direct access of local files to Document AI (i.e., without uploading the file to GCS bucket) using python? [Note that it's a mandatory requirement for me to run python code in local system, not in GCP.]
DocumentAI cannot "open" files by itself from your local filesystem.
If you don't want / cannot upload the documents to a bucket, you can send them in as part of the REST API. BUT in this case you cannot use BatchProcessing: I mean, you must process the files one by one and wait for a response.
The relevant REST API documentation is here: https://cloud.google.com/document-ai/docs/reference/rest/v1/projects.locations.processors/process
In the quickstart documentation for python you've got this sample code that reads a file and sends it inline as part of the request:
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)

How to upload downloaded file to s3 bucket using Lambda function

I saw different questions/answers but I could not find the one that worked for me. Hence, I am really new to AWS, I need your help. I am trying to download gzip file and load it to the json file then upload it to the S3 bucket using Lambda function. I wrote the code to download the file and convert it to json but having problem while uploading it to the s3 bucket. Assume that file is ready as x.json. What should I do then?
I know it is really basic question but still help needed :)
This code will upload to Amazon S3:
import boto3
s3_client = boto3.client('s3', region_name='us-west-2') # Change as appropriate
s3._client.upload_file('/tmp/foo.json', 'my-bucket', 'folder/foo.json')
Some tips:
In Lambda functions you can only write to /tmp/
There is a limit of 512MB
At the end of your function, delete the files (zip, json, etc) because the container can be reused and you don't want to run out of disk space
If your lambda has proper permission to write a file into S3, then simply use boto3 package which is an AWS SDK for python.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Be aware that if the lambda locates inside of VPC then lambda cannot access to the public internet, and also boto3 API endpoint. Thus, you may require a NAT gateway to proxy lambda to the public.

Long polling AWS S3 to check if item exists?

The context here is simple, there's a lambda (lambda1) that creates a file asynchronously and then uploads it to S3.
Then, another lambda (lambda2) receives the soon-to-exist file name and needs to keep checking S3 until the file exists.
I don't think S3 triggers will work because lambda2 is invoked by a client request
1) Do I get charged for this kind of request between lambda and S3? I will be polling it until the object exists
2) What other way could I achieve this that doesn't incur charges?
3) What method do I use to check if a file exists in S3? (just try to get it and check status code?)
This looks like you should be using an S3 objectCreated trigger on the Lambda. That way, whenever an object gets created, it will trigger your Lambda function automatically with the file metadata.
See here for information on configuring an S3 event trigger
Let me make sure I understand correctly.
Client calls Lambda1. Lambda1 creates a file async and uploads to S3
the call to lambda one returns as soon as lambda1 has started it's async processing.
Client calls lambda2, to pull the file from s3 that lambda1 is going to push there.
Why not just wait for Lambda one to create the file and return it to client? Otherwise this is going to be an expensive file exchange.

Identify external AWS S3 Buckets

Looking for some help writing some code that will pull down all bucket names and identify which ones are externally visible (open to the internet for read or write). I've read over the documentation for boto3 s3 and cannot find any commands that will allow me to make this query... should I be looking under IAM?
So far, I am only able to print the bucket names... I would like to report name + its internet presence. The goal is to identify which s3 buckets are visible from the internet so we can periodically review the data/objects within them.
print("############# S3 Bucket Dump ############")
s3 = boto3.resource('s3')
f = open('s3buckets.txt', 'w')
for bucket in s3.buckets.all():
print(bucket.name, file=f)

Resources