How do I create a Presigned URL to download a file from an S3 Bucket using Boto3? - python-3.x

I have to download a file from my S3 bucket onto my server for some processing. The bucket does not support direct connections and has to use a Pre-Signed URL.
The Boto3 Docs talk about using a presigned URL to upload but do not mention the same for download.

import boto3
s3_client = boto3.client('s3')
BUCKET = 'my-bucket'
OBJECT = 'foo.jpg'
url = s3_client.generate_presigned_url(
'get_object',
Params={'Bucket': BUCKET, 'Key': OBJECT},
ExpiresIn=300)
print(url)
For another example, see: Presigned URLs — Boto 3 documentation
You can also generate a pre-signed URL using the AWS CLI:
aws s3 presign s3://my-bucket/foo.jpg --expires-in 300
See: presign — AWS CLI Command Reference

Just to add to John's answer above, and save time to anyone poking around, the documentation does mention how to download as well as upload using the presigned URL as well:
How to download a file:
import requests # To install: pip install requests
url = create_presigned_url('BUCKET_NAME', 'OBJECT_NAME')
if url is not None:
response = requests.get(url)
Python Presigned URLs documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html

Related

How to load file from custom hosted Minio s3 bucket into pandas using s3 URL format?

I have Minio server hosted locally.
I need to read file from minio s3 bucket using pandas using S3 URL like "s3://dataset/wine-quality.csv" in Jupyter notebook.
I tried using s3 boto3 library am able to download file.
import boto3
s3 = boto3.resource('s3',
endpoint_url='localhost:9000',
aws_access_key_id='id',
aws_secret_access_key='password')
s3.Bucket('dataset').download_file('wine-quality.csv', '/tmp/wine-quality.csv')
But when I try using pandas,
data = pd.read_csv("s3://dataset/wine-quality.csv")
I'm getting client Error, Forbidden 403.
I know that pandas internally use boto3 library(correct me if am wrong)
PS: Pandas read_csv has one more param, " storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
}". But I couldn't find any configuration for passing custom Minio host URL for pandas to read.
Pandas v1.2 onwards allows you to pass storage options which gets passed down to fsspec, see the docs here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?highlight=s3fs#reading-writing-remote-files.
To pass in a custom url, you need to specify it through client_kwargs in storage_options:
df = pd.read_csv(
"s3://dataset/wine-quality.csv",
storage_options={
"key": AWS_ACCESS_KEY_ID,
"secret": AWS_SECRET_ACCESS_KEY,
"token": AWS_SESSION_TOKEN,
"client_kwargs": {"endpoint_url": "localhost:9000"}
}
)

How to upload downloaded file to s3 bucket using Lambda function

I saw different questions/answers but I could not find the one that worked for me. Hence, I am really new to AWS, I need your help. I am trying to download gzip file and load it to the json file then upload it to the S3 bucket using Lambda function. I wrote the code to download the file and convert it to json but having problem while uploading it to the s3 bucket. Assume that file is ready as x.json. What should I do then?
I know it is really basic question but still help needed :)
This code will upload to Amazon S3:
import boto3
s3_client = boto3.client('s3', region_name='us-west-2') # Change as appropriate
s3._client.upload_file('/tmp/foo.json', 'my-bucket', 'folder/foo.json')
Some tips:
In Lambda functions you can only write to /tmp/
There is a limit of 512MB
At the end of your function, delete the files (zip, json, etc) because the container can be reused and you don't want to run out of disk space
If your lambda has proper permission to write a file into S3, then simply use boto3 package which is an AWS SDK for python.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Be aware that if the lambda locates inside of VPC then lambda cannot access to the public internet, and also boto3 API endpoint. Thus, you may require a NAT gateway to proxy lambda to the public.

How to pass video file from S3 bucket to opencv VideoCapture?

I'm working on an aws lambda function on python that reads videos uploaded to an s3 bucket and extracts a few frames from it, i already have the script for extracting the frames with opencv but i don't know what parameter i should pass to cv2.VideoCapture since the file is only accessible through the s3 bucket.
I've tried passing the video as an s3 object with s3.get_object() as well as with s3.download_fileobj, none of this seemed to work tho.
I've also tried passing just the key of the video file in s3 but it didn't work either (I didn't expect this to work, but i was hopeless).
Code i have now:
import boto3
import cv2
import io
def lambda_handler(event, context):
s3 = boto3.client("s3")
bucket_name = "my_bucket"
video_key = "videos/video.mp4"
vidcap = cv2.VideoCapture(s3.get_object(Bucket=bucket_name,Key=video_path))
success,image = vidcap.read()
I've also tried with:
vidcap = cv2.VideoCapture(s3.download_fileobj(Bucket=bucket_name, Key=video_key, Fileobj=io.BytesIO())
But with no luck either
I'm getting success = False and image=None. I expect the output of success to be True and the image to be a numpy array to be able to read it.
A presigned url for S3 object can be used.
url = s3_client.generate_presigned_url( ClientMethod='get_object', Params={ 'Bucket': bucket, 'Key': key } )
vidcap = cv2.VideoCapture(url)
OpenCV is expecting to access a file on the local disk.
You would need to download the file from Amazon S3, then reference that file.
Please note that AWS Lambda only provides 500MB of disk space, and only in the /tmp/ directory.
You can try to create a AWS CloudFront distribution for s3 bucket. Here is the tutorial link: Use CloudFront to serve HTTPS requests S3

upload a file directly into S3 using python

I want to download a file received from a http url, directly into an amazon s3 bucket, instead of local system.
I run python on a 64 bit windows os.
I tried providing the Amazon S3's bucket url as the second argument of urlretrieve function of python during the file extract.
urllib.request.urlretrieve(url, amazon s3 bucket url)
I expected it to upload the file directly to s3, however it fails with filenotFound error , which , after some thought makes sense.
It appears that you want to run a command on a Windows computer (either local or running on Amazon EC2) that will copy the contents of a page identified by a URL directly onto Amazon S3.
This is not possible. There is no API call for Amazon S3 that retrieves content from a different location.
You will need to download the file from the Internet and then upload it to Amazon S3. The code would look something like:
import boto3
import urllib.request
urllib.request.urlretrieve('http://example.com/hello.txt', '/tmp/hello.txt')
s3 = boto3.client('s3')
s3.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

How to get the URL to a Google Cloud Storage file using gcloud-node?

Using the gcloud Node library, how do I get the URL for a file within a Cloud Storage bucket?
Consider the following instantation of a file object:
let bucket = gcs.bucket(`aBucket`)
let cloudFile = bucket.file(`aFile`)
I would like to get the URL for downloading cloudFile.
You can use a variety of request URIs, including storage.googleapis.com/<bucket>/<object>.
If file is public, then you can use a corresponding method:
cloudFile.publicUrl()

Resources