How to read parquet file from s3 using pandas - python-3.x

I am trying to read the parquet file which is in s3 using pandas.
Below is the code
import boto3
import pandas as pd
key = 'key'
secret = 'secret'
s3_client = boto3.client(
's3',
aws_access_key_id = key,
aws_secret_access_key = secret,
region_name = 'region_name'
)
print(s3_client)
AWS_S3_BUCKET='bucket_name'
filePath='data/wine_dataset'
response = s3_client.get_object(Bucket=AWS_S3_BUCKET, Key=filePath)
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
if status == 200:
print(f"Successful S3 get_object response. Status - {status}")
books_df = pd.read_parquet(response.get("Body"))
print(books_df)
else:
print(f"Unsuccessful S3 get_object response. Status - {status}")
I am getting the below error
NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
But when I read the same s3 path using pyspark it worked
path= 's3a://bucket_name/data/wine_dataset'
df = spark.read.parquet(path)
I am not sure why it is not working using pandas. Can anyone help me on this?

Related

python aws botocore.response.streamingbody to json

I am using boto3 to acccess files from S3,
The objective is to read the files and convert it to JSON
But the issue is none of the files have any file extension (no .csv,.json etc),although the data in the file is structured like JSON
client = boto3.client(
's3',
aws_access_key_id = 'AKEY',
aws_secret_access_key = 'ASAKEY',
region_name = 'us-east-1'
)
obj = client.get_object(
Bucket = 'bucketname',
Key = '*filename without extension*'
)
obj['Body'] returns a <botocore.response.StreamingBody> object
is it possible to find out the data within it?
The extension does not matter. Assuming your file contains valid json, you can get it:
my_json = json.loads(obj['Body'].read())
The response is a dictionary object.
Response returns StreamingBody in 'Body' attribute. So here is the solution.
Find more information here.
Boto S3 Get Object
client = boto3.client('s3')
response = client.get_object(
Bucket='<<bucket_name_here>>',
Key='<<file key from aws mangement console (S3 Info) >>'
)
jsonContent = json.loads(response['Body'].read())
print(jsonContent)

boto3 file upload in python

i am trying to upload file in aws s3 bucket via boto 3
but instead of file the following is being uploaded <_io.TextIOWrapper name='excel.csv' mode='a' encoding='UTF-8'>
def write_csv(data):
with open('excel.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([data['account_id'],
data['country'],
data['end_date'],
data['start_date']])
uploadtos3(str(file))
def uploadtos3(file):
key = 'xxxx'
seckey = 'xxxx'
s3 = boto3.resource( 's3',
aws_access_key_id = key,
aws_secret_access_key = seckey)
upload_file_bucket = 'apiuploadtest'
s3.Object(upload_file_bucket,str(file)).put(Body = str(file))
how to upload the file correctly?
Body in the put method of Object is:
Body (bytes or seekable file-like object) -- Object data.
Therefore, the following should be tried (fixed indentation and removed str):
def write_csv(data):
with open('excel.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([data['account_id'],
data['country'],
data['end_date'],
data['start_date']])
uploadtos3(file)
def uploadtos3(file):
key = 'xxxx'
seckey = 'xxxx'
s3 = boto3.resource('s3',
aws_access_key_id = key,
aws_secret_access_key = seckey)
upload_file_bucket = 'apiuploadtest'
s3.Object(upload_file_bucket, <key-name-on-s3>).put(Body = file)
By the way, its not a good practice to hardcode any AWS credentials in your source code.

Getting Error While Inserting JSON data to DynamoDB using Python

HI,
I am trying to put the json data into AWS dynamodb table using AWS Lambda , however i am getting an error like below. My json file is uploaded to S3 bucket.
Parameter validation failed:
Invalid type for parameter Item, value:
{
"IPList": [
"10.1.0.36",
"10.1.0.27"
],
"TimeStamp": "2020-04-22 11:43:13",
"IPCount": 2,
"LoadBalancerName": "internal-ALB-1447121364.us-west-2.elb.amazonaws.com"
}
, type: <class 'str'>, valid types: <class 'dict'>: ParamValidationError
Below i my python script:-
import boto3
import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
json_file_name = event['Records'][0]['s3']['object']['key']
json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
jsonFileReader = json_object['Body'].read()
jsonDict = json.loads(jsonFileReader)
table = dynamodb.Table('test')
table.put_item(Item=jsonDict)
return 'Hello'
Below is my json content
"{\"IPList\": [\"10.1.0.36\", \"10.1.0.27\"], \"TimeStamp\": \"2020-04-22 11:43:13\",
\"IPCount\": 2, \"LoadBalancerName\": \"internal-ALB-1447121364.us-west-2.elb.amazonaws.com\"}"
Can someone help me, how can insert data to dynamodb.
json.loads(jsonFileReader) returns string, but table.put_item() expects dict. Use json.load() instead.

Unable to Create S3 Bucket(in specific Region) using AWS Python Boto3

I am trying to create bucket using aws python boto 3.
Here is my code:-
import boto3
response = S3_CLIENT.create_bucket(
Bucket='symbols3arg',
CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)
print(response)
I am getting below error:-
botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
This happens you configured a different region during aws configure in specifying a different region in s3 client object initiation.
Suppose my AWS config look like
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODEXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json
and my python script for creating bucket
import logging
import boto3
from botocore.exceptions import ClientError
def create_bucket(bucket_name, region=None):
# Create bucket
try:
if region is None:
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket=bucket_name)
else:
s3_client = boto3.client('s3')
location = {'LocationConstraint': region}
s3_client.create_bucket(Bucket=bucket_name,
CreateBucketConfiguration=location)
except ClientError as e:
logging.error(e)
return False
return True
create_bucket("test-bucket-in-region","us-west-1")
This will throw the below error
ERROR:root:An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The us-west-1 location constraint is incompatible for the region specific endpoint this request was sent to.
To solve this issue all you need to specify the region in s3 client object initiation. A working example in different region regardless of aws configure
import logging
import boto3
from botocore.exceptions import ClientError
def create_bucket(bucket_name, region=None):
"""Create an S3 bucket in a specified region
If a region is not specified, the bucket is created in the S3 default
region (us-east-1).
:param bucket_name: Bucket to create
:param region: String region to create bucket in, e.g., 'us-west-2'
:return: True if bucket created, else False
"""
# Create bucket
try:
if region is None:
s3_client = boto3.client('s3')
s3_client.create_bucket(Bucket=bucket_name)
else:
s3_client = boto3.client('s3', region_name=region)
location = {'LocationConstraint': region}
s3_client.create_bucket(Bucket=bucket_name,
CreateBucketConfiguration=location)
except ClientError as e:
logging.error(e)
return False
return True
create_bucket("my-working-bucket","us-west-1")
create-an-amazon-s3-bucket
Send the command to S3 in the same region:
import boto3
s3_client = boto3.client('s3', region_name='eu-west-1')
response = s3_client.create_bucket(
Bucket='symbols3arg',
CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'}
)
You can try the following code.
import boto3
client = boto3.client('s3',region_name="aws_region_code")
response = client.create_bucket(
Bucket='string'
)
Hope, it might helps.

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).
put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}

Resources