Using AWS Lambda to read Kafka(MSK) event source - python-3.x

I am trying to read values from a kafka topic (AWS MSK) using AWS lambda.
The event record when printed from lambda looks like this:
{'eventSource': 'aws:kafka', 'eventSourceArn': 'arn:aws:kafka:ap-northeast-1:987654321:cluster/mskcluster/79y80c66-813a-4f-af0e-4ea47ba107e6', 'records': {'Transactions-0': [{'topic': 'Transactions', 'partition': 0, 'offset': 4798, 'timestamp': 1603565835915, 'timestampType': 'CREATE_TIME', 'value': 'eyJFdmVudFRpbWUiOiAiMjAyMC0xMC0yNCAxODo1NzoxNS45MTUzMjQiLCAiSVAiOiAiMTgwLjI0MS4xNTkuMjE4IiwgIkFjY291bnROdW1iZXIiOiwiMTQ2ODA4ODYiLCAiVXNlck5hbWUiOi67iQW1iZXIgUm9tYXJvIiwgIkFtb3VudCI6ICI1NTYyIiwgIlRyYW5zYWN0aW9uSUQiOiAiTzI4Qlg3TlBJbWZmSXExWCIsICJDb3VuTHJ5IjogIk9tYW4ifQ=='}]}}
How can I extract the 'topic' and 'value' fields? The value one is base64 encoded.
I get the following error:
NameError: name 'record' is not defined
I am trying the following code:
import json
import base64
def lambda_handler(event, context):
print(event)
message = event['records']
payload=base64.b64decode(record["message"]["value"])
print("Decoded payload: " + str(payload))
Sample MSK event structure

In your code snippet the record variable you try to pass to the decode function does not exist. An example to iterate over the records is:
records = event['records']['Transactions-0']
for record in records:
payload=base64.b64decode(record["message"]["value"])
print("Decoded payload: " + str(payload))
Every function call contains multiple records per topic. Though you could also iterate over those if you have multiple like Transactions-1,...

Related

AWS Lambda - Python : How to pass JSON input to event object in python handler

I have a lambda function with a lambda handler function.
I want to pass a key via the 'event' object. That key can then be processed via this handler function.
For example I want to pass a JSON input to the lambda handler. The JSON input contain a field 'who'.
This is the code in the lambda function:
import json
def lambda_handler(event, context):
return {
'statusCode': 200,
'body': json.dumps('Hello from ' + event.who ) # event.who does not exist even though i pass it via JSON
}
I created a test event and replaced the Event JSON with the following:
{
"who": "It is me!"
}
I am expecting 'who' to be accessible from within the event object inside the lambda_handler.
In python we can access the attributes inside a dict this way: dict['attribute']. The 'event' object is a dictionary of key-value pairs. Hence we can use event['who'] to fetch the value of the 'who' attribute.
def lambda_handler(event, context):
return {
'statusCode': 200,
'body': json.dumps('Hello from ' + event['who'] )
}

Phone book search on AWS Lambda and S3

I want to make a serverless application in AWS Lambda for phone book searches.
What I've done:
Created a bucket and uploaded a CSV file to it.
Created a role with full access to the bucket.
Created a Lambda function
Created API Gateway with GET and POST methods
The Lambda function contains the following code:
import boto3
import json
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='namebbacket',
Key='sample_data.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"Name\" = 'Jane'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
elif 'Stats' in event:
statsDetails = event['Stats']['Details']
print("Stats details bytesScanned: ")
print(statsDetails['BytesScanned'])
print("Stats details bytesProcessed: ")
print(statsDetails['BytesProcessed'])
print("Stats details bytesReturned: ")
print(statsDetails['BytesReturned'])
When I access the Invoke URL, I get the following error:
{errorMessage = Handler 'lambda_handler' missing on module 'lambda_function', errorType = Runtime.HandlerNotFound}
CSV structure: Name, PhoneNumber, City, Occupation
How to solve this problem?
Please refer to this documentation topic to learn how to write a Lambda function in Python. You are missing the Handler. See: AWS Lambda function handler in Python
Wecome to S.O. #smac2020 links you to the right place AWS Lambda function handler in Python. In short, AWS Lambda needs to know where to find your code, hence the "handler". Though a better way to think about it might be "entry-point."
Here is a close approximation of your function, refactored for use on AWS Lambda:
import json
import boto3
def function_to_be_called(event, context):
# TODO implement
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='stack-exchange',
Key='48836509/dogs.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"breen_name\" = 'pug'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'pugInfo': records
}
This function produces the following result:
Response
{
"statusCode": 200,
"body": "\"Hello from Lambda!\"",
"currentWorkdingDirectory": "/var/task",
"currentdirlist": [
"lambda_function.py"
],
"pugInfo": "1,pug,toy\r\n"
}
The "entry point" for this function is in a Python file called lambda_function.py and the function function_to_be_called. Together these are the "handler." We can see this in the Console:
or using the API through Boto3
import boto3
awslambda = boto3.client('lambda')
awslambda.get_function_configuration('s3SelectFunction')
Which returns:
{'CodeSha256': 'mFVVlakisUIIsLstQsJUpeBIeww4QhJjl7wJaXqsJ+Q=',
'CodeSize': 565,
'Description': '',
'FunctionArn': 'arn:aws:lambda:us-east-1:***********:function:s3SelectFunction',
'FunctionName': 's3SelectFunction',
'Handler': 'lambda_function.function_to_be_called',
'LastModified': '2021-03-10T00:57:48.651+0000',
'MemorySize': 128,
'ResponseMetadata': ...
'Version': '$LATEST'}

How to correctly call queryStringParameters for AWS Lambda + API Gateway?

I'm following a tutorial on setting up AWS API Gateway with a Lambda Function to create a restful API. I have the following code:
import json
def lambda_handler(event, context):
# 1. Parse query string parameters
transactionId = event['queryStringParameters']['transactionid']
transactionType = event['queryStringParameters']['type']
transactionAmounts = event['queryStringParameters']['amount']
# 2. Construct the body of the response object
transactionResponse = {}
# returning values originally passed in then add separate field at the bottom
transactionResponse['transactionid'] = transactionId
transactionResponse['type'] = transactionType
transactionResponse['amount'] = transactionAmounts
transactionResponse['message'] = 'hello from lambda land'
# 3. Construct http response object
responseObject = {}
responseObject['StatusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps(transactionResponse)
# 4. Return the response object
return responseObject
When I link the API Gateway to this function and try to call it using query parameters I get the error:
{
"message":"Internal server error"
}
When I test the lambda function it returns the error:
{
"errorMessage": "'transactionid'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 5, in lambda_handler\n transactionId = event['queryStringParameters']['transactionid']\n"
]
Does anybody have any idea of what's going on here/how to get it to work?
I recommend adding a couple of diagnostics, as follows:
import json
def lambda_handler(event, context):
print('event:', json.dumps(event))
print('queryStringParameters:', json.dumps(event['queryStringParameters']))
transactionId = event['queryStringParameters']['transactionid']
transactionType = event['queryStringParameters']['type']
transactionAmounts = event['queryStringParameters']['amount']
// remainder of code ...
That way you can see what is in event and event['queryStringParameters'] to be sure that it matches what you expected to see. These will be logged in CloudWatch Logs (and you can see them in the AWS Lambda console if you are testing events using the console).
In your case, it turns out that your test event included transactionId when your code expected to see transactionid (different spelling). Hence the KeyError exception.
just remove ['queryStringParameters']. the print event line shows the event i only a array not a key value pair. I happen to be following the same tutorial. I'm still on the api gateway part so i'll update once mine is completed.
Whan you test from the lambda function there is no queryStringParameters in the event but it is there when called from the api gateway, you can also test from the api gateway where queryStringParameters is required to get the values passed.
The problem is not your code. It is the Lambda function intergration setting. Please do not enable Lambda function intergration setting . You can still attach the Lambda function without it. Leave this unchecked.
It's because of the typo in responseObject['StatusCode'] = 200.
'StatusCode' should be 'statusCode'.
I got the same issue, and it was that.

Getting Error While Inserting JSON data to DynamoDB using Python

HI,
I am trying to put the json data into AWS dynamodb table using AWS Lambda , however i am getting an error like below. My json file is uploaded to S3 bucket.
Parameter validation failed:
Invalid type for parameter Item, value:
{
"IPList": [
"10.1.0.36",
"10.1.0.27"
],
"TimeStamp": "2020-04-22 11:43:13",
"IPCount": 2,
"LoadBalancerName": "internal-ALB-1447121364.us-west-2.elb.amazonaws.com"
}
, type: <class 'str'>, valid types: <class 'dict'>: ParamValidationError
Below i my python script:-
import boto3
import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
json_file_name = event['Records'][0]['s3']['object']['key']
json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
jsonFileReader = json_object['Body'].read()
jsonDict = json.loads(jsonFileReader)
table = dynamodb.Table('test')
table.put_item(Item=jsonDict)
return 'Hello'
Below is my json content
"{\"IPList\": [\"10.1.0.36\", \"10.1.0.27\"], \"TimeStamp\": \"2020-04-22 11:43:13\",
\"IPCount\": 2, \"LoadBalancerName\": \"internal-ALB-1447121364.us-west-2.elb.amazonaws.com\"}"
Can someone help me, how can insert data to dynamodb.
json.loads(jsonFileReader) returns string, but table.put_item() expects dict. Use json.load() instead.

aws firehose lambda function invocation gives wrong output strcuture format

When i insert a data object to aws firhose stream using a put operation it works fine .As lambda function is enabled on my firehose stream .hence a lambda function is invoked but gives me a output structure response error :
"errorMessage":"Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed."
so now i have created my lambda function like this way to make the correct output strcuture :
import base64
import json
print('Loading function')
def lambda_handler(event, context):
output=[]
print('event'+str(event))
for record in event['records']:
payload = base64.b64decode(record['data'])
print('payload'+str(payload))
payload=base64.b64encode(payload)
output_record={
'recordId':record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps('hello'))
}
output.append(output_record)
return { 'records': output }
Now i am getting the follwing eror on encoding the 'data' field as
"errorMessage": "a bytes-like object is required, not 'str'",
and if i change the 'hello' to bytes like b'hello' then i get the following error :
"errorMessage": "Object of type bytes is not JSON serializable",
import json
import base64
import gzip
import io
import zlib
def lambda_handler(event, context):
output = []
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload.encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}

Resources