Download file from AWS S3 using Python - python-3.x

I am trying to download a file from Amazon S3 bucket to my local using the below code but I get an error saying "Unable to locate credentials"
Given below is the code I have written:
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('bucket_name')
for s3_file in your_bucket.objects.all():
print(s3_file.key) # prints the contents of bucket
s3 = boto3.client ('s3')
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
Could anyone help me on this?

You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

From an example in the official documentation, the correct format is:
import boto3
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
You can also use a file-like object opened in binary mode.
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
with open('FILE_NAME', 'wb') as f:
s3.download_fileobj('BUCKET_NAME', 'OBJECT_NAME', f)
f.seek(0)
The code in question uses s3 = boto3.client ('s3'), which does not provide any credentials.
The format for authenticating a client is shown here:
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
# Or via the Session
session = boto3.Session(
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
And lastly you can also re-use the authenticated session you created to get the bucket, and then download then file from the bucket.
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
# session is authenticated and can access the resource in question
session.resource('s3')
.Bucket('bucket_name')
.download_file('k.png','/Users/username/Desktop/k.png')

For others trying to download files from AWS S3 looking for a more user-friendly solution with other industrial-strength features, check out https://github.com/d6t/d6tpipe. It abstracts the S3 functions into a simpler interface. It also supports directory sync, uploading files, permissions and many other things you need to sync files from S3 (and ftp).
import d6tpipe
api = d6tpipe.api.APILocal() # keep permissions locally for security
settings = \
{
'name': 'my-files',
'protocol': 's3',
'location': 'bucket-name',
'readCredentials' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
d6tpipe.api.create_pipe_with_remote(api, settings)
pipe = d6tpipe.Pipe(api, 'my-files')
pipe.scan_remote() # show all files
pipe.pull_preview() # preview
pipe.pull(['k.png']) # download single file
pipe.pull() # download all files
pipe.files() # show files
file=open(pipe.dirpath/'k.png') # access file

You can setup your AWS profile with awscli to avoid introduce your credentials in the file. First add your profile:
aws configure --profile account1
Then in your code add:
aws_session = boto3.Session(profile_name="account1")
s3_client = aws_session.client('s3')

FileName:
can be any name; with that name; file will be downloaded.
It can be added to any existing local directory.
Key:
Is the S3 file path along with the file name in the end.
It does not start with a backslash.
Session()
It automatically picks the credentials from ~/.aws/config OR ~/.aws/credentials
If not you need to explicitly pass that.
from boto3.session import Session
import boto3
# Let's use Amazon S3
s3 = boto3.resource("s3")
# Print out bucket names to check you have accessibility
# for bucket in s3.buckets.all():
# print(bucket.name)
session = Session()
OR
session = Session(aws_access_key_id="AKIAYJN2LNOU",
aws_secret_access_key="wMyT0SxEOsoeiHYVO3v9Gc",
region_name="eu-west-1")
session.resource('s3').Bucket('bucket-logs').download_file(Key="logs/20221122_0_5ee03da676ac566336e2279decfc77b3.gz", Filename="/tmp/Local_file_name.gz")

Related

cloud function read storage object data without using the storage client

I have created a simple cloud function with trigger: google.cloud.storage.object.v1.finalized
When a file (.xlsx) is being uploaded to my bucket I want to read it's content.
I am using following method for the same:
import functions_framework
#functions_framework.cloud_event
def process_data(cloud_event):
print(f"Data: {cloud_event.data}")
I am able to print the cloud_event.data but how to I get the actual file which was uploaded
One way that I can do is using the storage client in below manner:
from google.cloud import storage
import functions_framework
def get_file(object_name, bucket_name, download_path):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(object_name)
blob.download_to_filename(download_path)
#functions_framework.cloud_event
def process_data(cloud_event):
print(f"Data: {cloud_event.data}")
object_name = cloud_event.data['name']
bucket_name = cloud_event.data['bucket']
download_path = "/tmp/"
get_file(object_name, bucket_name, download_path)
But is there a way through which I can get the actual contents of the file without using the cloud storage client ?

python aws botocore.response.streamingbody to json

I am using boto3 to acccess files from S3,
The objective is to read the files and convert it to JSON
But the issue is none of the files have any file extension (no .csv,.json etc),although the data in the file is structured like JSON
client = boto3.client(
's3',
aws_access_key_id = 'AKEY',
aws_secret_access_key = 'ASAKEY',
region_name = 'us-east-1'
)
obj = client.get_object(
Bucket = 'bucketname',
Key = '*filename without extension*'
)
obj['Body'] returns a <botocore.response.StreamingBody> object
is it possible to find out the data within it?
The extension does not matter. Assuming your file contains valid json, you can get it:
my_json = json.loads(obj['Body'].read())
The response is a dictionary object.
Response returns StreamingBody in 'Body' attribute. So here is the solution.
Find more information here.
Boto S3 Get Object
client = boto3.client('s3')
response = client.get_object(
Bucket='<<bucket_name_here>>',
Key='<<file key from aws mangement console (S3 Info) >>'
)
jsonContent = json.loads(response['Body'].read())
print(jsonContent)

How to change storage class of object in s3 bucket using boto3?

I am trying to change the storage class of an object in S3 from standard to IA
This is similar to this thread. But I would like to do it using boto3 and lambda trigger.
thanks
You can use copy_object class:
You can use the CopyObject action to change the storage class of an object that is already stored in Amazon S3 using the StorageClass parameter.
For example:
import boto3
s3 = boto3.client('s3')
bucket_name = '<your bucket-name>'
object_key = '<your-object-key>'
r = s3.copy_object(
CopySource=f"{bucket_name}/{object_key}",
Bucket=bucket_name,
Key=object_key,
StorageClass='STANDARD_IA')
print(r)

Unable to Create S3 Bucket(in specific Region) using AWS Python Boto3

I am trying to create bucket using aws python boto 3.
Here is my code:-
import boto3
response = S3_CLIENT.create_bucket(
Bucket='symbols3arg',
CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)
print(response)
I am getting below error:-
botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
This happens you configured a different region during aws configure in specifying a different region in s3 client object initiation.
Suppose my AWS config look like
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODEXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json
and my python script for creating bucket
import logging
import boto3
from botocore.exceptions import ClientError
def create_bucket(bucket_name, region=None):
# Create bucket
try:
if region is None:
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket=bucket_name)
else:
s3_client = boto3.client('s3')
location = {'LocationConstraint': region}
s3_client.create_bucket(Bucket=bucket_name,
CreateBucketConfiguration=location)
except ClientError as e:
logging.error(e)
return False
return True
create_bucket("test-bucket-in-region","us-west-1")
This will throw the below error
ERROR:root:An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The us-west-1 location constraint is incompatible for the region specific endpoint this request was sent to.
To solve this issue all you need to specify the region in s3 client object initiation. A working example in different region regardless of aws configure
import logging
import boto3
from botocore.exceptions import ClientError
def create_bucket(bucket_name, region=None):
"""Create an S3 bucket in a specified region
If a region is not specified, the bucket is created in the S3 default
region (us-east-1).
:param bucket_name: Bucket to create
:param region: String region to create bucket in, e.g., 'us-west-2'
:return: True if bucket created, else False
"""
# Create bucket
try:
if region is None:
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket=bucket_name)
else:
s3_client = boto3.client('s3', region_name=region)
location = {'LocationConstraint': region}
s3_client.create_bucket(Bucket=bucket_name,
CreateBucketConfiguration=location)
except ClientError as e:
logging.error(e)
return False
return True
create_bucket("my-working-bucket","us-west-1")
create-an-amazon-s3-bucket
Send the command to S3 in the same region:
import boto3
s3_client = boto3.client('s3', region_name='eu-west-1')
response = s3_client.create_bucket(
Bucket='symbols3arg',
CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)
You can try the following code.
import boto3
client = boto3.client('s3',region_name="aws_region_code")
response = client.create_bucket(
Bucket='string'
)
Hope, it might helps.

Writing data to google cloud storage using python

I cannot find a way to to write a data set from my local machine into the google cloud storage using python. I have researched a a lot but didn't find any clue regarding this. Need help, thanks
Quick example, using the google-cloud Python library:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
More examples are in this GitHub repo: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/storage/cloud-client
When we want to write a string to a GCS bucket blob, the only change necessary is using blob.upload_from_string(your_string) rather than blob.upload_from_filename(source_file_name):
from google.cloud import storage
def write_to_cloud(your_string):
client = storage.Client()
bucket = client.get_bucket('bucket123456789')
blob = bucket.blob('PIM.txt')
blob.upload_from_string(your_string)
In the earlier answers, I still miss the easiest way, using the open() method.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob-name.txt')
## Use bucket.get_blob('path/to/existing-blob-name.txt') to write to existing blobs
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
service = discovery.build('storage', 'v1', credentials=credentials)
filename = 'file.csv'
bucket = 'Your bucket name here'
body = {'name': 'file.csv'}
req = service.objects().insert(bucket=bucket, body=body, media_body=filename)
resp = req.execute()
from google.cloud import storage
def write_to_cloud(buffer):
client = storage.Client()
bucket = client.get_bucket('bucket123456789')
blob = bucket.blob('PIM.txt')
blob.upload_from_file(buffer)
While Brandon's answer indeed gets the file to Google cloud, it does this by uploading the file, as opposed to writing the file. This means that the file needs to exist on your disk before you upload it to the cloud.
My proposed solution uses an "in-memory" payload (the buffer parameter) which is then written to cloud. To write the content you need to use upload_from_file instead of upload_from_filename, everything else being the same.

Resources