migrating from boto2 to 3 - python-3.x

I have this code that use boto2 that I need to port to boto3, and frankly I got a little lost in the boto3 docs:
connection = boto.connect_s3(host=hostname,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
is_secure=False,
calling_format=boto.s3.connection.OrdinaryCallingFormat())
s3_bucket = connection.get_bucket(bucket_name)
I also need to make this work with other object stores that aren't aws S3.

import boto3
s3 = boto3.client('s3', aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
endpoint_url=hostname, use_ssl=False)
response = s3.get_bucket(Bucket=bucket_name)
client docs
s3 docs

boto3 and boto are incompatible. Most of the naming are NOT backward compatible.
You MUST read the boto3 documentation to recreate script. The good news is, Boto3 documentation is better than boto, though not superb (many tricky parameter example not provided) .
If you have some apps using some old function, you should create a wrapper code for it to make the switching transparent.
Thus, you instance any object store connection through wrapper, then instantiate various bucket usign different connector. Here is some idea.
#AWS
# object_wrapper is a your bucket wrapper that All the application willc all
from object_wrapper import object_bucket
from boto3lib.s3 import s3_connector
connector = s3_connector()
bucket = object_bucket(BucketName="xyz", Connector=connector)
# say you use boto2 to connect to Google object store
from object_wrapper import object_bucket
from boto2lib.s3 import s3_connector
connector = s3_connector()
bucket = object_bucket(BucketName="xyz", Connector=connector)
# say for Azure
from object_wrapper import object_bucket
from azure.storage.blob import BlockBlobService
connector = BlockBlobService(......)
bucket = object_bucket(BucketName="xyz", Connector=connector)

Related

How to filter Google Cloud Instances using google-compute-engine python client?

I am using
Package Version
------------------------ ---------
google-api-core 2.0.1
google-auth 2.0.2
google-cloud-compute 0.5.0
google-compute-engine 2.8.13
to retrieve google cloud instances data. I was referring to the docs to get aggregated list of instances. I wasnt able to filter instances based on tags of the compute VM instances. Is there a particular way to do it using python as there is no particular mention about it in the documentation. ?
Please include your code.
You should be able to apply a Filter to AggregatedListInstancesRequest and you should be able to specify labels, see Filtering searches using labels. I'm confident (!?) the API is consistent.
The documentation refers to the filter query parameter. I didn't manage to get it to work with tags. But you can still list all the instances and filter it directly.
In the example below I'm looking for the gke-nginx-1-cluster-846d7440-node tag.
from googleapiclient import discovery
from google.oauth2 import service_account
scopes = ['https://www.googleapis.com/auth/compute']
sa_file = 'alon-lavian-0474ca3b9309.json'
credentials = service_account.Credentials.from_service_account_file(sa_file, scopes=scopes)
service = discovery.build('compute', 'v1', credentials=credentials)
project = 'alon-lavian'
zone = 'us-central1-a'
request = service.instances().list(project=project, zone=zone)
while request is not None:
response = request.execute()
for instance in response['items']:
if 'gke-nginx-1-cluster-846d7440-node' in instance['tags']['items']:
print(instance)
request = service.instances().list_next(previous_request=request, previous_response=response)

How to load file from custom hosted Minio s3 bucket into pandas using s3 URL format?

I have Minio server hosted locally.
I need to read file from minio s3 bucket using pandas using S3 URL like "s3://dataset/wine-quality.csv" in Jupyter notebook.
I tried using s3 boto3 library am able to download file.
import boto3
s3 = boto3.resource('s3',
endpoint_url='localhost:9000',
aws_access_key_id='id',
aws_secret_access_key='password')
s3.Bucket('dataset').download_file('wine-quality.csv', '/tmp/wine-quality.csv')
But when I try using pandas,
data = pd.read_csv("s3://dataset/wine-quality.csv")
I'm getting client Error, Forbidden 403.
I know that pandas internally use boto3 library(correct me if am wrong)
PS: Pandas read_csv has one more param, " storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
}". But I couldn't find any configuration for passing custom Minio host URL for pandas to read.
Pandas v1.2 onwards allows you to pass storage options which gets passed down to fsspec, see the docs here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?highlight=s3fs#reading-writing-remote-files.
To pass in a custom url, you need to specify it through client_kwargs in storage_options:
df = pd.read_csv(
"s3://dataset/wine-quality.csv",
storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
"client_kwargs": {"endpoint_url": "localhost:9000"}
}
)

How to upload downloaded file to s3 bucket using Lambda function

I saw different questions/answers but I could not find the one that worked for me. Hence, I am really new to AWS, I need your help. I am trying to download gzip file and load it to the json file then upload it to the S3 bucket using Lambda function. I wrote the code to download the file and convert it to json but having problem while uploading it to the s3 bucket. Assume that file is ready as x.json. What should I do then?
I know it is really basic question but still help needed :)
This code will upload to Amazon S3:
import boto3
s3_client = boto3.client('s3', region_name='us-west-2') # Change as appropriate
s3._client.upload_file('/tmp/foo.json', 'my-bucket', 'folder/foo.json')
Some tips:
In Lambda functions you can only write to /tmp/
There is a limit of 512MB
At the end of your function, delete the files (zip, json, etc) because the container can be reused and you don't want to run out of disk space
If your lambda has proper permission to write a file into S3, then simply use boto3 package which is an AWS SDK for python.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Be aware that if the lambda locates inside of VPC then lambda cannot access to the public internet, and also boto3 API endpoint. Thus, you may require a NAT gateway to proxy lambda to the public.

How to pass video file from S3 bucket to opencv VideoCapture?

I'm working on an aws lambda function on python that reads videos uploaded to an s3 bucket and extracts a few frames from it, i already have the script for extracting the frames with opencv but i don't know what parameter i should pass to cv2.VideoCapture since the file is only accessible through the s3 bucket.
I've tried passing the video as an s3 object with s3.get_object() as well as with s3.download_fileobj, none of this seemed to work tho.
I've also tried passing just the key of the video file in s3 but it didn't work either (I didn't expect this to work, but i was hopeless).
Code i have now:
import boto3
import cv2
import io
def lambda_handler(event, context):
s3 = boto3.client("s3")
bucket_name = "my_bucket"
video_key = "videos/video.mp4"
vidcap = cv2.VideoCapture(s3.get_object(Bucket=bucket_name,Key=video_path))
success,image = vidcap.read()
I've also tried with:
vidcap = cv2.VideoCapture(s3.download_fileobj(Bucket=bucket_name, Key=video_key, Fileobj=io.BytesIO())
But with no luck either
I'm getting success = False and image=None. I expect the output of success to be True and the image to be a numpy array to be able to read it.
A presigned url for S3 object can be used.
url = s3_client.generate_presigned_url( ClientMethod='get_object', Params={ 'Bucket': bucket, 'Key': key } )
vidcap = cv2.VideoCapture(url)
OpenCV is expecting to access a file on the local disk.
You would need to download the file from Amazon S3, then reference that file.
Please note that AWS Lambda only provides 500MB of disk space, and only in the /tmp/ directory.
You can try to create a AWS CloudFront distribution for s3 bucket. Here is the tutorial link: Use CloudFront to serve HTTPS requests S3

Unable to create s3 bucket using boto3

I'm trying to create a aws bucket from python3 using boto3. create_bucket() is the method I use. Still I get an error botocore.errorfactory.BucketAlreadyExists
MY CODE:
import boto3
ACCESS_KEY = 'theaccesskey'
SECRET_KEY = 'thesecretkey'
S3 = boto3.client('s3',
aws_access_key_id = ACCESS_KEY,
aws_secret_access_key = SECRET_KEY)
response = S3.create_bucket(Bucket='mynewbucket',
CreateBucketConfiguration={'LocationConstraint':'ap-south-1'})
ERROR:
botocore.errorfactory.BucketAlreadyExists: An error occurred (BucketAlreadyExists)
when calling the CreateBucket operation: The requested bucket name is not available.
The bucket namespace is shared by all users of the system.
Please select a different name and try again.
However, the Bucket does not exist and it still failed to create the bucket.
EDIT
I found the reason from the link and I also posted that in answers in-order to help someone.
I got it after reading few articles on-line. The bucket name should be globally unique once it satifies that condition it works as I expect.
I share this to help someone wonders just like me
Reference

Resources