I am using boto3 to acccess files from S3,
The objective is to read the files and convert it to JSON
But the issue is none of the files have any file extension (no .csv,.json etc),although the data in the file is structured like JSON
client = boto3.client(
's3',
aws_access_key_id = 'AKEY',
aws_secret_access_key = 'ASAKEY',
region_name = 'us-east-1'
)
obj = client.get_object(
Bucket = 'bucketname',
Key = '*filename without extension*'
)
obj['Body'] returns a <botocore.response.StreamingBody> object
is it possible to find out the data within it?
The extension does not matter. Assuming your file contains valid json, you can get it:
my_json = json.loads(obj['Body'].read())
The response is a dictionary object.
Response returns StreamingBody in 'Body' attribute. So here is the solution.
Find more information here.
Boto S3 Get Object
client = boto3.client('s3')
response = client.get_object(
Bucket='<<bucket_name_here>>',
Key='<<file key from aws mangement console (S3 Info) >>'
)
jsonContent = json.loads(response['Body'].read())
print(jsonContent)
Related
I am trying to read the parquet file which is in s3 using pandas.
Below is the code
import boto3
import pandas as pd
key = 'key'
secret = 'secret'
s3_client = boto3.client(
's3',
aws_access_key_id = key,
aws_secret_access_key = secret,
region_name = 'region_name'
)
print(s3_client)
AWS_S3_BUCKET='bucket_name'
filePath='data/wine_dataset'
response = s3_client.get_object(Bucket=AWS_S3_BUCKET, Key=filePath)
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
if status == 200:
print(f"Successful S3 get_object response. Status - {status}")
books_df = pd.read_parquet(response.get("Body"))
print(books_df)
else:
print(f"Unsuccessful S3 get_object response. Status - {status}")
I am getting the below error
NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
But when I read the same s3 path using pyspark it worked
path= 's3a://bucket_name/data/wine_dataset'
df = spark.read.parquet(path)
I am not sure why it is not working using pandas. Can anyone help me on this?
i am trying to upload file in aws s3 bucket via boto 3
but instead of file the following is being uploaded <_io.TextIOWrapper name='excel.csv' mode='a' encoding='UTF-8'>
def write_csv(data):
with open('excel.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([data['account_id'],
data['country'],
data['end_date'],
data['start_date']])
uploadtos3(str(file))
def uploadtos3(file):
key = 'xxxx'
seckey = 'xxxx'
s3 = boto3.resource( 's3',
aws_access_key_id = key,
aws_secret_access_key = seckey)
upload_file_bucket = 'apiuploadtest'
s3.Object(upload_file_bucket,str(file)).put(Body = str(file))
how to upload the file correctly?
Body in the put method of Object is:
Body (bytes or seekable file-like object) -- Object data.
Therefore, the following should be tried (fixed indentation and removed str):
def write_csv(data):
with open('excel.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([data['account_id'],
data['country'],
data['end_date'],
data['start_date']])
uploadtos3(file)
def uploadtos3(file):
key = 'xxxx'
seckey = 'xxxx'
s3 = boto3.resource('s3',
aws_access_key_id = key,
aws_secret_access_key = seckey)
upload_file_bucket = 'apiuploadtest'
s3.Object(upload_file_bucket, <key-name-on-s3>).put(Body = file)
By the way, its not a good practice to hardcode any AWS credentials in your source code.
Im trying to make a api which will upload video to s3 . I all ready managed to upload the video in s3, but the problem is the video file is not working . i checked content-type of video file, and it's binary/octet-stream instead on video/mp4 . So i set content-type to "video/mp4" while calling put_object api, but it still not working.
I use Lambda function for putting the video to s3 . here is my lambda code -
import json
import base64
import boto3
def lambda_handler(event, context):
bucket_name = 'ad-live-streaming'
s3_client = boto3.client('s3')
file_content = event['content']
merchantId = event['merchantId']
catelogId = event['catelogId']
file_name = event['fileName']
file_path = '{}/{}/{}.mp4'.format(merchantId, catelogId, file_name)
s3_response = s3_client.put_object(Bucket=bucket_name, Key=file_path, Body=file_content, ContentType='video/mp4')
return {
'statusCode': 200,
"merchantId":merchantId,
"catelogId":catelogId,
"file_name":file_name,
}
Any idea how to solve this issue ?
Based on the example in Upload binary files to S3 using AWS API Gateway with AWS Lambda | by Omer Hanetz | The Startup | Medium, it appears that you need to decode the file from base64:
file_content = base64.b64decode(event['content'])
In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).
put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}
I am trying to download a file from Amazon S3 bucket to my local using the below code but I get an error saying "Unable to locate credentials"
Given below is the code I have written:
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('bucket_name')
for s3_file in your_bucket.objects.all():
print(s3_file.key) # prints the contents of bucket
s3 = boto3.client ('s3')
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
Could anyone help me on this?
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
From an example in the official documentation, the correct format is:
import boto3
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
You can also use a file-like object opened in binary mode.
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
with open('FILE_NAME', 'wb') as f:
s3.download_fileobj('BUCKET_NAME', 'OBJECT_NAME', f)
f.seek(0)
The code in question uses s3 = boto3.client ('s3'), which does not provide any credentials.
The format for authenticating a client is shown here:
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
# Or via the Session
session = boto3.Session(
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN,
)
And lastly you can also re-use the authenticated session you created to get the bucket, and then download then file from the bucket.
from boto3.session import Session
import boto3
ACCESS_KEY = 'ABC'
SECRET_KEY = 'XYZ'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
# session is authenticated and can access the resource in question
session.resource('s3')
.Bucket('bucket_name')
.download_file('k.png','/Users/username/Desktop/k.png')
For others trying to download files from AWS S3 looking for a more user-friendly solution with other industrial-strength features, check out https://github.com/d6t/d6tpipe. It abstracts the S3 functions into a simpler interface. It also supports directory sync, uploading files, permissions and many other things you need to sync files from S3 (and ftp).
import d6tpipe
api = d6tpipe.api.APILocal() # keep permissions locally for security
settings = \
{
'name': 'my-files',
'protocol': 's3',
'location': 'bucket-name',
'readCredentials' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
d6tpipe.api.create_pipe_with_remote(api, settings)
pipe = d6tpipe.Pipe(api, 'my-files')
pipe.scan_remote() # show all files
pipe.pull_preview() # preview
pipe.pull(['k.png']) # download single file
pipe.pull() # download all files
pipe.files() # show files
file=open(pipe.dirpath/'k.png') # access file
You can setup your AWS profile with awscli to avoid introduce your credentials in the file. First add your profile:
aws configure --profile account1
Then in your code add:
aws_session = boto3.Session(profile_name="account1")
s3_client = aws_session.client('s3')
FileName:
can be any name; with that name; file will be downloaded.
It can be added to any existing local directory.
Key:
Is the S3 file path along with the file name in the end.
It does not start with a backslash.
Session()
It automatically picks the credentials from ~/.aws/config OR ~/.aws/credentials
If not you need to explicitly pass that.
from boto3.session import Session
import boto3
# Let's use Amazon S3
s3 = boto3.resource("s3")
# Print out bucket names to check you have accessibility
# for bucket in s3.buckets.all():
# print(bucket.name)
session = Session()
OR
session = Session(aws_access_key_id="AKIAYJN2LNOU",
aws_secret_access_key="wMyT0SxEOsoeiHYVO3v9Gc",
region_name="eu-west-1")
session.resource('s3').Bucket('bucket-logs').download_file(Key="logs/20221122_0_5ee03da676ac566336e2279decfc77b3.gz", Filename="/tmp/Local_file_name.gz")