How to create constent object for aws dynamodb in python? - python-3.x

I want to know how can check if the boto3 client object already instantiated or not. i have below code for creating the object for aws dynamodb
import boto3
def aws_dynamodb():
return boto3.resource("dynamodb")
def get_db_con():
dynamo_conn=aws_dynamodb()
return dynamo_conn
Now the above 'get_db_con()' return the connection to the dynamodb. but i want to make sure that 'get_db_con' not creating the client object from the 'aws_dynamodb()' everytime when its been called.
for eg:
def aws_db_table(table):
con=get_db_con()
return con.table(table)
account_table=aws_db_table("my_ac_table")
audit_table=aws_db_table("audit_table")
so here whenever i call the 'aws_db_table', it should not create the client for 'aws_dynamodb()' for everytime.
so how i can check it if the aws_dynamodb() is already instantiated or not creating new client object. Because creating client object everytime is costly..
Note: I want to run the code in Lambda function
Please help us on this..
Thanks

I usually go with a low-tech solution with a global variable that looks something like this:
import boto3
# Underscore-prefix to indicate this is something private
_TABLE_RESOURCE = None
def get_table_resource():
global _TABLE_RESOURCE
if _TABLE_RESOURCE is None:
_TABLE_RESOURCE = boto3.resource("dynamodb").Table("my_table")
return _TABLE_RESOURCE
def handler(event, context):
table = get_table_resource()
# ...
Global variables are persisted across Lambda executions, that's why this works.
Another option would be to use the lru_cache from functools, which uses memoization.
from functools import lru_cache
import boto3
#lru_cache(maxsize=128)
def get_table_resource():
return boto3.resource("dynamodb").Table("my_table")
def handler(event, context):
table = get_table_resource()
# ...
For those not familiar with memoization the first solution is probably easier to read + understand.
(Note: I wrote the code from memory, there may be bugs)

Related

How to log the return value of a POST method after returning the response?

I'm working on my first ever REST API, so apologies in advance if I've missed something basic. I have a function that takes a JSON request from another server, processes it (makes a prediction based on the data), and returns another JSON with the results. I'd like to keep a log on the server's local disk of all requests to this endpoint along with their results, for evaluation purposes and for retraining the model. However, for the purposes of minimising the latency of returning the result to the user, I'd like to return the response data first, and then write it to the local disk. It's not obvious to me how to do this properly, as the FastAPI paradigm necessitates that the result of a POST method is the return value of the decorated function, so anything I want to do with the data has to be done before it is returned.
Below is a minimal working example of what I think is my closest attempt at getting it right so far, using a custom object with a log decorator - my idea was just to assign the result to the log object as a class attribute, then use another method to write it to disk, but I can't figure out how to make sure that that function gets called after get_data every time.
import json
import uvicorn
from fastapi import FastAPI, Request
from functools import wraps
from pydantic import BaseModel
class Blob(BaseModel):
id: int
x: float
def crunch_numbers(data: Blob) -> dict:
# does some stuff
return {'foo': 'bar'}
class PostResponseLogger:
def __init__(self) -> None:
self.post_result = None
def log(self, func, *args, **kwargs):
#wraps(func)
def func_to_log(*args, **kwargs):
post_result = func(*args, **kwargs)
self.post_result = post_result
# how can this be done outside of this function ???
self.write_data()
return post_result
return func_to_log
def write_data(self):
if self.post_result:
with open('output.json', 'w') as f:
json.dump(self.post_result, f)
def main():
app = FastAPI()
logger = PostResponseLogger()
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request):
result = crunch_numbers(input_json)
return result
uvicorn.run(app=app)
if __name__ == '__main__':
main()
Basically, my question boils down to: "is there a way, in the PostResponseLogger class, to automatically call self.write_data after every call to self.log?", but if I'm using the wrong approach altogether, any other suggestions are also welcome.
You could have a Background Task for that purpose. A background task "will run only once the response has been sent" (see Starlette documentation). "This is useful for operations that need to happen after a request, but that the client doesn't really have to be waiting for the operation to complete before receiving the response" (see FastAPI documentation).
You can define a task function to run in the background for writing the log data, as shown below:
def write_log_data():
logger.write_data()
Then, import BackgroundTasks and define a parameter in your endpoint with a type declaration of BackgroundTasks. Inside of your endpoint, pass your task function (i.e., write_log_data, as defined above) to the background_tasks object with the method .add_task():
from fastapi import BackgroundTasks
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request, background_tasks: BackgroundTasks):
result = crunch_numbers(input_json)
background_tasks.add_task(write_log_data)
return result
The same principle could be applied if a middleware was used to capture and log the response data, as described in this answer, or a custom APIRoute class, as demonstrated in this answer.
For future reference, if you (or anyone) ever need to use async/await syntax, and run into concurrency issues (such as the event loop getting blocked) while performing some heavy background computation, please have a look at this answer, which explains the difference between defining an endpoint or a background task function with async def and def (briefly, async def endpoints/background tasks will run in the event loop, whereas def functions will run in an external threadpool that is then awaited), as well as provides solutions when it comes to running blocking I/O-bound or CPU-bound operations in such functions.

Python mocking using MOTO for SSM

Taken from this answer:
Python mock AWS SSM
I now have this code:
test_2.py
from unittest import TestCase
import boto3
import pytest
from moto import mock_ssm
#pytest.yield_fixture
def s3ssm():
with mock_ssm():
ssm = boto3.client("ssm")
yield ssm
#mock_ssm
class MyTest(TestCase):
def setUp(self):
ssm = boto3.client("ssm")
ssm.put_parameter(
Name="/mypath/password",
Description="A test parameter",
Value="this is it!",
Type="SecureString",
)
def test_param_getting(self):
import real_code
resp = real_code.get_variable("/mypath/password")
assert resp["Parameter"]["Value"] == "this is it!"
and this is my code to test (or a cut down example):
real_code.py
import boto3
class ParamTest:
def __init__(self) -> None:
self.client = boto3.client("ssm")
pass
def get_parameters(self, param_name):
print(self.client.describe_parameters())
return self.client.get_parameters_by_path(Path=param_name)
def get_variable(param_name):
p = ParamTest()
param_details = p.get_parameters(param_name)
return param_details
I have tried a number of solutions, and switched between pytest and unittest quite a few times!
Each time I run the code, it doesn't reach out to AWS so it seems something is affecting the boto3 client, but it doesn't return the parameter. If I edit real_code.py to not have a class inside it the test passes.
Is it not possible to patch the client inside the class in the real_code.py file? I'm trying to do this without editing the real_code.py file at all if possible.
Thanks,
The get_parameters_by_path returns all parameters that are prefixed with the supplied path.
When providing /mypath, it would return /mypath/password.
But when providing /mypath/password, as in your example, it will only return parameters that look like this: /mypath/password/..
If you are only looking to retrieve a single parameter, the get_parameter call would be more suitable:
class ParamTest:
def __init__(self) -> None:
self.client = boto3.client("ssm")
pass
def get_parameters(self, param_name):
# Decrypt the value, as it is stored as a SecureString
return self.client.get_parameter(Name=param_name, WithDecryption=True)
Edit: Note that Moto behaves the same as AWS in this.
From https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ssm.html#SSM.Client.get_parameters_by_path:
[The Path-parameter is t]he hierarchy for the parameter. [...] The hierachy is the parameter name except the last part of the parameter. For the API call to succeeed, the last part of the parameter name can't be in the path.

Adding type-hinting to functions that return boto3 objects?

How do I add type-hinting to my functions that return various boto3 resources? I'd like to get automatic completion/checking on my return values in IDEs like PyCharm. Boto3 does some factory creation magic so I can't figure out how to declare the types correctly
import boto3
ec2 = boto3.Session().resource('ec2')
a = ec2.Image('asdf')
a.__class__ # => boto3.resources.factory.ec2.Image
But boto3.resources.factory.ec2.Image doesn't seem to be a class that's recognized by Python. So I can't use it for a type hint.
The docs show that the return type is EC2.Image. But is there a way to import that type as regular Python type?
UPDATE 2021
As mentioned by #eega, I no longer maintain the package. I'd recommend checking out boto3-stubs. It's a much more mature version of boto3_type_annotations.
Original Answer
I made a package that can help with this, boto3_type_annotations. It's available with or without documentation as well. Example usage below. There's also a gif at my github showing it in action using PyCharm.
import boto3
from boto3_type_annotations.s3 import Client, ServiceResource
from boto3_type_annotations.s3.waiter import BucketExists
from boto3_type_annotations.s3.paginator import ListObjectsV2
# With type annotations
client: Client = boto3.client('s3')
client.create_bucket(Bucket='foo') # Not only does your IDE knows the name of this method,
# it knows the type of the `Bucket` argument too!
# It also, knows that `Bucket` is required, but `ACL` isn't!
# Waiters and paginators and defined also...
waiter: BucketExists = client.get_waiter('bucket_exists')
waiter.wait('foo')
paginator: ListObjectsV2 = client.get_paginator('list_objects_v2')
response = paginator.paginate(Bucket='foo')
# Along with service resources.
resource: ServiceResource = boto3.resource('s3')
bucket = resource.Bucket('bar')
bucket.create()
# With type comments
client = boto3.client('s3') # type: Client
response = client.get_object(Bucket='foo', Key='bar')
# In docstrings
class Foo:
def __init__(self, client):
"""
:param client: It's an S3 Client and the IDE is gonna know what it is!
:type client: Client
"""
self.client = client
def bar(self):
"""
:rtype: Client
"""
self.client.delete_object(Bucket='foo', Key='bar')
return self.client
The boto3_type_annotations mentioned by Allie Fitter is deprecated, but he links to an alternative: https://pypi.org/project/boto3-stubs/

How to have more than one handler in AWS Lambda Function?

I have a very large python file that consists of multiple defined functions. If you're familiar with AWS Lambda, when you create a lambda function, you specify a handler, which is a function in the code that AWS Lambda can invoke when service executes my code, which is represented below in my_handler.py file:
def handler_name(event, context):
...
return some_value
Link Source: https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
However, as I mentioned above, I have multiple defined functions in my_handler.py that have their own events and contexts. Therefore, this will result in an error. Are there any ways around this in python3.6?
Your single handler function will need to be responsible for parsing the incoming event, and determining the appropriate route to take. For example, let's say your other functions are called helper1 and helper2. Your Lambda handler function will inspect the incoming event and then, based on one of the fields in the incoming event (ie. let's call it EventType), call either helper1 or helper2, passing in both the event and context objects.
def handler_name(event, context):
if event['EventType'] == 'helper1':
helper1(event, context)
elif event['EventType'] == 'helper2':
helper2(event, context)
def helper1(event, context):
pass
def helper2(event, context):
pass
This is only pseudo-code, and I haven't tested it myself, but it should get the concept across.
Little late to the game but thought it wouldn't hurt to share. Best practices suggest that one separate the handler from the Lambda's core logic. Not only is it okay to add additional definitions, it can lead to more legible code and reduce waste--e.g. multiple API calls to S3. So, although it can get out of hand, I disagree with some of those critiques to your initial question. It's effective to use your handler as a logical interface to the additional functions that will accomplish your various work. In Data Architecture & Engineering land it's often less-costly and more efficient to work in this manner. Particularly if you are building out ETL pipelines, following service-oriented architectural patterns. Admittedly, I'm a bit of a Maverick and some may find this unruly/egregious but I've gone so far as to build classes into my Lambdas for various reasons--e.g. centralized, data-lake-ish S3 buckets that accommodate a variety of file types, reduce unnecessary requests, etc...--and I stand by it. Here's an example of one of my handler files from a CDK example project I put on the hub awhile back. Hopefully it'll give you some useful ideas, or at the very least not feel alone in wanting to beef up your Lambdas.
import requests
import json
from requests.exceptions import Timeout
from requests.exceptions import HTTPError
from botocore.exceptions import ClientError
from datetime import date
import csv
import os
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
class Asteroids:
"""Client to NASA API and execution interface to branch data processing by file type.
Notes:
This class doesn't look like a normal class. It is a simple example of how one might
workaround AWS Lambda's limitations of class use in handlers. It also allows for
better organization of code to simplify this example. If one planned to add
other NASA endpoints or process larger amounts of Asteroid data for both .csv and .json formats,
asteroids_json and asteroids_csv should be modularized and divided into separate lambdas
where stepfunction orchestration is implemented for a more comprehensive workflow.
However, for the sake of this demo I'm keeping it lean and easy.
"""
def execute(self, format):
"""Serves as Interface to assign class attributes and execute class methods
Raises:
Exception: If file format is not of .json or .csv file types.
Notes:
Have fun!
"""
self.file_format=format
self.today=date.today().strftime('%Y-%m-%d')
# method call below used when Secrets Manager integrated. See get_secret.__doc__ for more.
# self.api_key=get_secret('nasa_api_key')
self.api_key=os.environ["NASA_KEY"]
self.endpoint=f"https://api.nasa.gov/neo/rest/v1/feed?start_date={self.today}&end_date={self.today}&api_key={self.api_key}"
self.response_object=self.nasa_client(self.endpoint)
self.processed_response=self.process_asteroids(self.response_object)
if self.file_format == "json":
self.asteroids_json(self.processed_response)
elif self.file_format == "csv":
self.asteroids_csv(self.processed_response)
else:
raise Exception("FILE FORMAT NOT RECOGNIZED")
self.write_to_s3()
def nasa_client(self, endpoint):
"""Client component for API call to NASA endpoint.
Args:
endpoint (str): Parameterized url for API call.
Raises:
Timeout: If connection not made in 5s and/or data not retrieved in 15s.
HTTPError & Exception: Self-explanatory
Notes:
See Cloudwatch logs for debugging.
"""
try:
response = requests.get(endpoint, timeout=(5, 15))
except Timeout as timeout:
print(f"NASA GET request timed out: {timeout}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f'Other error occurred: {err}')
else:
return json.loads(response.content)
def process_asteroids(self, payload):
"""Process old, and create new, data object with content from response.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
near_earth_objects = payload["near_earth_objects"][f"{self.today}"]
asteroids = []
for neo in near_earth_objects:
asteroid_object = {
"id" : neo['id'],
"name" : neo['name'],
"hazard_potential" : neo['is_potentially_hazardous_asteroid'],
"est_diameter_min_ft": neo['estimated_diameter']['feet']['estimated_diameter_min'],
"est_diameter_max_ft": neo['estimated_diameter']['feet']['estimated_diameter_max'],
"miss_distance_miles": [item['miss_distance']['miles'] for item in neo['close_approach_data']],
"close_approach_exact_time": [item['close_approach_date_full'] for item in neo['close_approach_data']]
}
asteroids.append(asteroid_object)
return asteroids
def asteroids_json(self, payload):
"""Creates json object from payload content then writes to .json file.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
json_file = open(f"/tmp/asteroids_{self.today}.json",'w')
json_file.write(json.dumps(payload, indent=4))
json_file.close()
def asteroids_csv(self, payload):
"""Creates .csv object from payload content then writes to .csv file.
"""
csv_file=open(f"/tmp/asteroids_{self.today}.csv",'w', newline='\n')
fields=list(payload[0].keys())
writer=csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerows(payload)
csv_file.close()
def get_secret(self):
"""Gets secret from AWS Secrets Manager
Notes:
Have yet to integrate into the CDK. Leaving as example code.
"""
secret_name = os.environ['TOKEN_SECRET_NAME']
region_name = os.environ['REGION']
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
raise e
else:
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
secret = b64decode(get_secret_value_response['SecretBinary'])
return secret
def write_to_s3(self):
"""Uploads both .json and .csv files to s3
"""
s3 = boto3.client('s3')
s3.upload_file(f"/tmp/asteroids_{self.today}.{self.file_format}", os.environ['S3_BUCKET'], f"asteroid_data/asteroids_{self.today}.{self.file_format}")
def handler(event, context):
"""Instantiates class and triggers execution method.
Args:
event (dict): Lists a custom dict that determines interface control flow--i.e. `csv` or `json`.
context (obj): Provides methods and properties that contain invocation, function and
execution environment information.
*Not used herein.
"""
asteroids = Asteroids()
asteroids.execute(event)

python 3: mock a method of the boto3 S3 client

I want to unit test some code that calls a method of the boto3 s3 client.
I can't use moto because this particular method (put_bucket_lifecycle_configuration) is not yet implemented in moto.
I want to mock the S3 client and assure that this method was called with specific parameters.
The code I want to test looks something like this:
# sut.py
import boto3
class S3Bucket(object):
def __init__(self, name, lifecycle_config):
self.name = name
self.lifecycle_config = lifecycle_config
def create(self):
client = boto3.client("s3")
client.create_bucket(Bucket=self.name)
rules = # some code that computes rules from self.lifecycle_config
# I want to test that `rules` is correct in the following call:
client.put_bucket_lifecycle_configuration(Bucket=self.name, \
LifecycleConfiguration={"Rules": rules})
def create_a_bucket(name):
lifecycle_policy = # a dict with a bunch of key/value pairs
bucket = S3Bucket(name, lifecycle_policy)
bucket.create()
return bucket
In my test, I'd like to call create_a_bucket() (though instantiating an S3Bucket directly is also an option) and make sure that the call to put_bucket_lifecycle_configuration was made with the correct parameters.
I have messed around with unittest.mock and botocore.stub.Stubber but have not managed to crack this. Unless otherwise urged, I am not posting my attempts since they have not been successful so far.
I am open to suggestions on restructuring the code I'm trying to test in order to make it easier to test.
Got the test to work with the following, where ... is the remainder of the arguments that are expected to be passed to s3.put_bucket_lifecycle_configuration().
# test.py
from unittest.mock import patch
import unittest
import sut
class MyTestCase(unittest.TestCase):
#patch("sut.boto3")
def test_lifecycle_config(self, cli):
s3 = cli.client.return_value
sut.create_a_bucket("foo")
 s3.put_bucket_lifecycle_configuration.assert_called_once_with(Bucket="foo", ...)
if __name__ == '__main__':
unittest.main()

Resources