Getting Error While Inserting JSON data to DynamoDB using Python - python-3.x

HI,
I am trying to put the json data into AWS dynamodb table using AWS Lambda , however i am getting an error like below. My json file is uploaded to S3 bucket.
Parameter validation failed:
Invalid type for parameter Item, value:
{
"IPList": [
"10.1.0.36",
"10.1.0.27"
],
"TimeStamp": "2020-04-22 11:43:13",
"IPCount": 2,
"LoadBalancerName": "internal-ALB-1447121364.us-west-2.elb.amazonaws.com"
}
, type: <class 'str'>, valid types: <class 'dict'>: ParamValidationError
Below i my python script:-
import boto3
import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
json_file_name = event['Records'][0]['s3']['object']['key']
json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
jsonFileReader = json_object['Body'].read()
jsonDict = json.loads(jsonFileReader)
table = dynamodb.Table('test')
table.put_item(Item=jsonDict)
return 'Hello'
Below is my json content
"{\"IPList\": [\"10.1.0.36\", \"10.1.0.27\"], \"TimeStamp\": \"2020-04-22 11:43:13\",
\"IPCount\": 2, \"LoadBalancerName\": \"internal-ALB-1447121364.us-west-2.elb.amazonaws.com\"}"
Can someone help me, how can insert data to dynamodb.

json.loads(jsonFileReader) returns string, but table.put_item() expects dict. Use json.load() instead.

Related

How to read parquet file from s3 using pandas

I am trying to read the parquet file which is in s3 using pandas.
Below is the code
import boto3
import pandas as pd
key = 'key'
secret = 'secret'
s3_client = boto3.client(
's3',
aws_access_key_id = key,
aws_secret_access_key = secret,
region_name = 'region_name'
)
print(s3_client)
AWS_S3_BUCKET='bucket_name'
filePath='data/wine_dataset'
response = s3_client.get_object(Bucket=AWS_S3_BUCKET, Key=filePath)
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
if status == 200:
print(f"Successful S3 get_object response. Status - {status}")
books_df = pd.read_parquet(response.get("Body"))
print(books_df)
else:
print(f"Unsuccessful S3 get_object response. Status - {status}")
I am getting the below error
NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
But when I read the same s3 path using pyspark it worked
path= 's3a://bucket_name/data/wine_dataset'
df = spark.read.parquet(path)
I am not sure why it is not working using pandas. Can anyone help me on this?

JSONDecodeError : Extra data: line 2 column 1

I'm trying to read json zipped file from S3 buckets and writing to a dynamo db table using aws lambda service and I chose python boto3 language. After I read the s3 data, while trying to run json.loads I'm getting this error.
My code looks something like -
import json
import gzip
import boto3
from io import BytesIO
s3 = boto3.resource('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
json_file_name = event['Records'][0]['s3']['object']['key']
json_object = s3.Object(bucket, json_file_name)
n = json_object.get()['Body'].read()
gzipfile = BytesIO(n)
gzipfile = gzip.GzipFile(fileobj=gzipfile)
content = gzipfile.read().decode('utf-8')
jsonDict = json.loads(content)
# Write items to dynamo db table
table = dynamodb.Table('mahbis01-AccountService-LedgerSummary-Duplicate')
table.put_item(Item=jsonDict)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
When I printed content, I see the values like -
{
"Item":{
"SubsId":{
"S":"255_0_908764"
}
}
}{
"Item":{
"SubsId":{
"S":"255_0_908765"
}
}
}{
"Item":{
"SubsId":{
"S":"255_0_908766"
}
}
}{
"Item":{
"SubsId":{
"S":"255_0_908767"
}
}
}
How can I get rid of this and write the data to dynamo db?
Your content is obviously incorrect json. Assuming the content string has constant format, you can convert it to json using:
jsonDict = json.loads('['+content.replace('}{','},{')+']')
which will give you valid list of dicts:
[{'Item': {'SubsId': {'S': '255_0_908764'}}}, {'Item': {'SubsId': {'S': '255_0_908765'}}}, {'Item': {'SubsId': {'S': '255_0_908766'}}}, {'Item': {'SubsId': {'S': '255_0_908767'}}}]
Then you can iterate over it and process how you want, e.g.:
for item in jsonDict:
print(item)
# or upload to dynamodb

Phone book search on AWS Lambda and S3

I want to make a serverless application in AWS Lambda for phone book searches.
What I've done:
Created a bucket and uploaded a CSV file to it.
Created a role with full access to the bucket.
Created a Lambda function
Created API Gateway with GET and POST methods
The Lambda function contains the following code:
import boto3
import json
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='namebbacket',
Key='sample_data.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"Name\" = 'Jane'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
elif 'Stats' in event:
statsDetails = event['Stats']['Details']
print("Stats details bytesScanned: ")
print(statsDetails['BytesScanned'])
print("Stats details bytesProcessed: ")
print(statsDetails['BytesProcessed'])
print("Stats details bytesReturned: ")
print(statsDetails['BytesReturned'])
When I access the Invoke URL, I get the following error:
{errorMessage = Handler 'lambda_handler' missing on module 'lambda_function', errorType = Runtime.HandlerNotFound}
CSV structure: Name, PhoneNumber, City, Occupation
How to solve this problem?
Please refer to this documentation topic to learn how to write a Lambda function in Python. You are missing the Handler. See: AWS Lambda function handler in Python
Wecome to S.O. #smac2020 links you to the right place AWS Lambda function handler in Python. In short, AWS Lambda needs to know where to find your code, hence the "handler". Though a better way to think about it might be "entry-point."
Here is a close approximation of your function, refactored for use on AWS Lambda:
import json
import boto3
def function_to_be_called(event, context):
# TODO implement
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='stack-exchange',
Key='48836509/dogs.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"breen_name\" = 'pug'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'pugInfo': records
}
This function produces the following result:
Response
{
"statusCode": 200,
"body": "\"Hello from Lambda!\"",
"currentWorkdingDirectory": "/var/task",
"currentdirlist": [
"lambda_function.py"
],
"pugInfo": "1,pug,toy\r\n"
}
The "entry point" for this function is in a Python file called lambda_function.py and the function function_to_be_called. Together these are the "handler." We can see this in the Console:
or using the API through Boto3
import boto3
awslambda = boto3.client('lambda')
awslambda.get_function_configuration('s3SelectFunction')
Which returns:
{'CodeSha256': 'mFVVlakisUIIsLstQsJUpeBIeww4QhJjl7wJaXqsJ+Q=',
'CodeSize': 565,
'Description': '',
'FunctionArn': 'arn:aws:lambda:us-east-1:***********:function:s3SelectFunction',
'FunctionName': 's3SelectFunction',
'Handler': 'lambda_function.function_to_be_called',
'LastModified': '2021-03-10T00:57:48.651+0000',
'MemorySize': 128,
'ResponseMetadata': ...
'Version': '$LATEST'}

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).
put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}

aws firehose lambda function invocation gives wrong output strcuture format

When i insert a data object to aws firhose stream using a put operation it works fine .As lambda function is enabled on my firehose stream .hence a lambda function is invoked but gives me a output structure response error :
"errorMessage":"Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed."
so now i have created my lambda function like this way to make the correct output strcuture :
import base64
import json
print('Loading function')
def lambda_handler(event, context):
output=[]
print('event'+str(event))
for record in event['records']:
payload = base64.b64decode(record['data'])
print('payload'+str(payload))
payload=base64.b64encode(payload)
output_record={
'recordId':record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps('hello'))
}
output.append(output_record)
return { 'records': output }
Now i am getting the follwing eror on encoding the 'data' field as
"errorMessage": "a bytes-like object is required, not 'str'",
and if i change the 'hello' to bytes like b'hello' then i get the following error :
"errorMessage": "Object of type bytes is not JSON serializable",
import json
import base64
import gzip
import io
import zlib
def lambda_handler(event, context):
output = []
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload.encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}

Resources