How to have more than one handler in AWS Lambda Function? - python-3.x

I have a very large python file that consists of multiple defined functions. If you're familiar with AWS Lambda, when you create a lambda function, you specify a handler, which is a function in the code that AWS Lambda can invoke when service executes my code, which is represented below in my_handler.py file:
def handler_name(event, context):
...
return some_value
Link Source: https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
However, as I mentioned above, I have multiple defined functions in my_handler.py that have their own events and contexts. Therefore, this will result in an error. Are there any ways around this in python3.6?

Your single handler function will need to be responsible for parsing the incoming event, and determining the appropriate route to take. For example, let's say your other functions are called helper1 and helper2. Your Lambda handler function will inspect the incoming event and then, based on one of the fields in the incoming event (ie. let's call it EventType), call either helper1 or helper2, passing in both the event and context objects.
def handler_name(event, context):
if event['EventType'] == 'helper1':
helper1(event, context)
elif event['EventType'] == 'helper2':
helper2(event, context)
def helper1(event, context):
pass
def helper2(event, context):
pass
This is only pseudo-code, and I haven't tested it myself, but it should get the concept across.

Little late to the game but thought it wouldn't hurt to share. Best practices suggest that one separate the handler from the Lambda's core logic. Not only is it okay to add additional definitions, it can lead to more legible code and reduce waste--e.g. multiple API calls to S3. So, although it can get out of hand, I disagree with some of those critiques to your initial question. It's effective to use your handler as a logical interface to the additional functions that will accomplish your various work. In Data Architecture & Engineering land it's often less-costly and more efficient to work in this manner. Particularly if you are building out ETL pipelines, following service-oriented architectural patterns. Admittedly, I'm a bit of a Maverick and some may find this unruly/egregious but I've gone so far as to build classes into my Lambdas for various reasons--e.g. centralized, data-lake-ish S3 buckets that accommodate a variety of file types, reduce unnecessary requests, etc...--and I stand by it. Here's an example of one of my handler files from a CDK example project I put on the hub awhile back. Hopefully it'll give you some useful ideas, or at the very least not feel alone in wanting to beef up your Lambdas.
import requests
import json
from requests.exceptions import Timeout
from requests.exceptions import HTTPError
from botocore.exceptions import ClientError
from datetime import date
import csv
import os
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
class Asteroids:
"""Client to NASA API and execution interface to branch data processing by file type.
Notes:
This class doesn't look like a normal class. It is a simple example of how one might
workaround AWS Lambda's limitations of class use in handlers. It also allows for
better organization of code to simplify this example. If one planned to add
other NASA endpoints or process larger amounts of Asteroid data for both .csv and .json formats,
asteroids_json and asteroids_csv should be modularized and divided into separate lambdas
where stepfunction orchestration is implemented for a more comprehensive workflow.
However, for the sake of this demo I'm keeping it lean and easy.
"""
def execute(self, format):
"""Serves as Interface to assign class attributes and execute class methods
Raises:
Exception: If file format is not of .json or .csv file types.
Notes:
Have fun!
"""
self.file_format=format
self.today=date.today().strftime('%Y-%m-%d')
# method call below used when Secrets Manager integrated. See get_secret.__doc__ for more.
# self.api_key=get_secret('nasa_api_key')
self.api_key=os.environ["NASA_KEY"]
self.endpoint=f"https://api.nasa.gov/neo/rest/v1/feed?start_date={self.today}&end_date={self.today}&api_key={self.api_key}"
self.response_object=self.nasa_client(self.endpoint)
self.processed_response=self.process_asteroids(self.response_object)
if self.file_format == "json":
self.asteroids_json(self.processed_response)
elif self.file_format == "csv":
self.asteroids_csv(self.processed_response)
else:
raise Exception("FILE FORMAT NOT RECOGNIZED")
self.write_to_s3()
def nasa_client(self, endpoint):
"""Client component for API call to NASA endpoint.
Args:
endpoint (str): Parameterized url for API call.
Raises:
Timeout: If connection not made in 5s and/or data not retrieved in 15s.
HTTPError & Exception: Self-explanatory
Notes:
See Cloudwatch logs for debugging.
"""
try:
response = requests.get(endpoint, timeout=(5, 15))
except Timeout as timeout:
print(f"NASA GET request timed out: {timeout}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f'Other error occurred: {err}')
else:
return json.loads(response.content)
def process_asteroids(self, payload):
"""Process old, and create new, data object with content from response.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
near_earth_objects = payload["near_earth_objects"][f"{self.today}"]
asteroids = []
for neo in near_earth_objects:
asteroid_object = {
"id" : neo['id'],
"name" : neo['name'],
"hazard_potential" : neo['is_potentially_hazardous_asteroid'],
"est_diameter_min_ft": neo['estimated_diameter']['feet']['estimated_diameter_min'],
"est_diameter_max_ft": neo['estimated_diameter']['feet']['estimated_diameter_max'],
"miss_distance_miles": [item['miss_distance']['miles'] for item in neo['close_approach_data']],
"close_approach_exact_time": [item['close_approach_date_full'] for item in neo['close_approach_data']]
}
asteroids.append(asteroid_object)
return asteroids
def asteroids_json(self, payload):
"""Creates json object from payload content then writes to .json file.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
json_file = open(f"/tmp/asteroids_{self.today}.json",'w')
json_file.write(json.dumps(payload, indent=4))
json_file.close()
def asteroids_csv(self, payload):
"""Creates .csv object from payload content then writes to .csv file.
"""
csv_file=open(f"/tmp/asteroids_{self.today}.csv",'w', newline='\n')
fields=list(payload[0].keys())
writer=csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerows(payload)
csv_file.close()
def get_secret(self):
"""Gets secret from AWS Secrets Manager
Notes:
Have yet to integrate into the CDK. Leaving as example code.
"""
secret_name = os.environ['TOKEN_SECRET_NAME']
region_name = os.environ['REGION']
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
raise e
else:
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
secret = b64decode(get_secret_value_response['SecretBinary'])
return secret
def write_to_s3(self):
"""Uploads both .json and .csv files to s3
"""
s3 = boto3.client('s3')
s3.upload_file(f"/tmp/asteroids_{self.today}.{self.file_format}", os.environ['S3_BUCKET'], f"asteroid_data/asteroids_{self.today}.{self.file_format}")
def handler(event, context):
"""Instantiates class and triggers execution method.
Args:
event (dict): Lists a custom dict that determines interface control flow--i.e. `csv` or `json`.
context (obj): Provides methods and properties that contain invocation, function and
execution environment information.
*Not used herein.
"""
asteroids = Asteroids()
asteroids.execute(event)

Related

How do I use a pytest fixture to mock a child class's inherited methods with classes as properties while maintaining the API contract using autospec?

How it started
I'm testing a class, ClassToTest, that makes API calls using atlassian-python-api. The tests are going to ensure that ClassToTest performs correctly with the data it gets back from the API. Many of the atlassian-python-api API calls use instantiated classes which inherit from the same base class or group of top-level classes.
I'd like to write tests that will expose breaks in the API contract if the wrong data is returned or API calls fail, while also testing the class I wrote to ensure it does the correct things with the data returned from the API. In order to do this, I was hoping to use unittest.mock.patch("path.to.Comment", autospec=True) to copy the API spec into the MagicMock, but I don't believe it's working properly.
For the purposes of the question, ClassToTest is not that important; what I am aiming to solve is how to setup and configure the pytest fixtures in a way that I can use them to mimic the API endpoints that will return the data that ClassToTest will act upon. Ideally I'd like to reuse the fixtures without having patch conflicts. I've included relevant code from ClassToTest for illustrative purposes here:
class_to_test.py:
from atlassian.bitbucket import Cloud
from typing import NamedTuple
# these are hardcoded constants that work with the production API
from src.constants import (
PULL_REQUEST_ID,
REPOSITORY,
WORKSPACE,
)
CommentType = NamedTuple("CommentType", [("top_level", str), ("inline", str)])
class ClassToTest:
def _get_token(self):
"""this returns a token of type(str)"""
def __init__(self, workspace, repository, pull_request_id):
self.active_comments = None
self.environment = sys.argv[1]
self.comment_text = CommentType(
top_level=r"top_level_comment text", inline=r"inline_comment text"
)
self.cloud = Cloud(token=self._get_token(), cloud=True)
self.workspace = self.cloud.workspaces.get(workspace)
self.repository = self.cloud.repositories.get(workspace, repository)
self.pull_request = self.repository.pullrequests.get(id=pull_request_id)
def _get_active_comments(self):
"""Returns a list of active (non-deleted) comments"""
return [
c for c in self.pull_request.comments() if c.data["deleted"] is False
]
# a few more methods here
def main():
instance = ClassToTest(WORKSPACE, REPOSITORY, PULL_REQUEST_ID)
# result = instance.method() for each method I need to call.
# do things with each result
if __name__ == "__main__":
main()
The class has methods that retrieve comments from the API (_get_active_comments, above), act on the retrieved comments, retrieve pull requests, and so on. What I am trying to test is that the class methods act correctly on the data received from the API, so I need to accurately mock data returned from API calls.
How it's going
I started with a unittest.Testcase style test class and wanted the flexibility of pytest fixtures (and autospec), but removed Testcase entirely when I discovered that pytest fixtures don't really work with it. I'm currently using a pytest class and conftest.py as follows:
/test/test_class_to_test.py:
import pytest
from unittest.mock import patch
from src.class_to_test import ClassToTest
#pytest.mark.usefixtures("mocked_comment", "mocked_user")
class TestClassToTest:
# We mock Cloud here as ClassToTest calls it in __init__ to authenticate with the API
# _get_token retrieves an access token for the API; since we don't need it, we can mock it
#patch("src.test_class_to_test.Cloud", autospec=True)
#patch.object(ClassToTest, "_get_token").
def setup_method(self, method, mock_get_token, mock_cloud):
mock_get_token.return_value = "token"
self.checker = ClassToTest("WORKSPACE", "REPOSITORY", 1)
def teardown_method(self, method):
pass
def test_has_top_level_and_inline_comments(self, mocked_comment, mocked_pull_request):
mock_top_comment = mocked_comment(raw="some text to search for later")
assert isinstance(mock_top_comment.data, dict)
assert mock_top_comment.data["raw"] == "some text to search for later"
# the assert below this line is failing
assert mock_top_comment.user.account_id == 1234
conftest.py:
import pytest
from unittest.mock import patch, PropertyMock
from atlassian.bitbucket.cloud.common.comments import Comment
from atlassian.bitbucket.cloud.common.users import User
#pytest.fixture()
def mocked_user(request):
def _mocked_user(account_id=1234):
user_patcher = patch(
f"atlassian.bitbucket.cloud.common.users.User", spec_set=True, autospec=True
)
MockUser = user_patcher.start()
data = {"type": "user", "account_id": account_id}
url = "user_url"
user = MockUser(data=data, url=url)
# setup mocked properties
mock_id = PropertyMock(return_value=account_id)
type(user).id = mock_id
mockdata = PropertyMock(return_value=data)
type(user).data = mockdata
request.addfinalizer(user_patcher.stop)
return user
return _mocked_user
#pytest.fixture()
def mocked_comment(request, mocked_user):
def _mocked_comment(raw="", inline=None, deleted=False, user_id=1234):
comment_patcher = patch(
f"atlassian.bitbucket.cloud.common.comments.Comment", spec_set=True, autospec=True
)
MockComment = comment_patcher.start()
data = {
"type": "pullrequest_comment",
"user": mocked_user(user_id),
"raw": raw,
"deleted": deleted,
}
if inline:
data["inline"] = {"from": None, "to": 1, "path": "src/code_issues.py"}
data["raw"] = "this is an inline comment"
comment = MockComment(data)
# setup mocked properties
mockdata = PropertyMock(return_value=data)
type(comment).data = mockdata
# mockuser = PropertyMock(return_value=mocked_user(user_id))
# type(comment).user = mockuser
request.addfinalizer(comment_patcher.stop)
return comment
return _mocked_comment
The problem I am encountering is that the assert mock_top_comment.user.account_id == 1234 line fails when running the test, with the following error:
> assert mock_top_comment.user.account_id == 1234
E AssertionError: assert <MagicMock name='Comment().user.account_id' id='4399290192'> == 1234
E + where <MagicMock name='Comment().user.account_id' id='4399290192'> = <MagicMock name='Comment().user' id='4399634736'>.account_id
E + where <MagicMock name='Comment().user' id='4399634736'> = <NonCallableMagicMock name='Comment()' spec_set='Comment' id='4399234928'>.user
How do I get the mock User class to attach to the mock Comment class in the same way that the real API makes it work? Is there something about autospec that I'm missing, or should I be abandoning unittest.mock.patch entirely and using something else?
Extra credit (EDIT: in retrospect, this may be the most important part)
I'm using mocked_comment as a pytest fixture factory and want to reuse it multiple times in the same test (for example to create multiple mocked Comments returned in a list). So far, each time I've tried to do that, I've been met with the following error:
def test_has_top_level_and_inline_comments(self, mocked_comment, mocked_pull_request):
mock_top_comment = mocked_comment(raw="Some comment text")
> mock_inline_comment = mocked_comment(inline=True)
...
test/conftest.py:30: in _mocked_comment
MockComment = comment_patcher.start()
/opt/homebrew/Cellar/python#3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/mock.py:1585: in start
result = self.__enter__()
...
> raise InvalidSpecError(
f'Cannot autospec attr {self.attribute!r} from target '
f'{target_name!r} as it has already been mocked out. '
f'[target={self.target!r}, attr={autospec!r}]')
E unittest.mock.InvalidSpecError: Cannot autospec attr 'Comment' from target 'atlassian.bitbucket.cloud.common.comments' as it has already been mocked out. [target=<module 'atlassian.bitbucket.cloud.common.comments' from '/opt/homebrew/lib/python3.10/site-packages/atlassian/bitbucket/cloud/common/comments.py'>, attr=<MagicMock name='Comment' spec_set='Comment' id='4398964912'>]
I thought the whole point of a pytest fixture factory was to be reusable, but I believe that using an autospec mock complicates things quite a bit. I don't want to have to hand copy every detail from the API spec into the tests, as that will have to be changed if anything in the API changes. Is there a solution for this that involves automatically and dynamically creating the necessary classes in the mocked API with the correct return values for properties?
One thing I'm considering is separating the testing into two parts: API contract, and ClassToTest testing. In this way I can write the tests for ClassToTest without relying on the API and they will pass as long as I manipulate the received data correctly. Any changes to the API will get caught by the separate contract testing tests. Then I can use non-factory fixtures with static data for testing ClassToTest.
For now though, I'm out of ideas on how to proceed with this. What should I do here? Probably the most important thing to address is how to properly link the User instance with the Comment instance in the fixtures so that my method calls in test work the same way as they do in production. Bonus points if we can figure out how to dynamically patch multiple fixtures in a single test.
I've started looking at this answer, but given the number of interconnected classes and properties, I'm not sure it will work without writing out a ton of fixtures. After following the directions and applying them to the User mock inside the Comment mock, I started getting the error in the Extra Credit section above, where autospec couldn't be used as it has already been mocked out.

Multiprocessing.Pool: can not iterate over IMapIterator object in AWS Batch because of PicklingError

I need to request huge bulk of data from an API endpoint and I want to use multiprocessing (vs multithreading, company framework limitations)
I have a multiprocessing.Pool with predefined concurrency CONCURRENCY in a class called Batcher. The class looks like this:
class Batcher:
def __init__(self, concurrency: int = 8):
self.concurrency = concurrency
def _interprete_response_to_succ_or_err(self, resp: requests.Response) -> str:
if isinstance(resp, str):
if "Error:" in resp:
return "dlq"
else:
return "err"
if isinstance(resp, requests.Response):
if resp.status_code == 200:
return "succ"
else:
return "err"
def _fetch_dat_data(self, id: str) -> requests.Response:
try:
resp = requests.get(API_ENDPOINT)
return resp
except Exception as e:
return f"ID {id} -> Error: {str(e)}"
def _dispatch_batch(self, batch: list) -> dict:
pool = MPool(self.concurrency)
results = pool.imap(self._fetch_dat_data, batch)
pool.close()
pool.join()
return results
def _run_batch(self, id):
return self._dispatch_batch(id)
def start(self, id_list: list):
""" In real class, this function will create smaller
batches from bigger chunks of data """
results = self._run_batch(id_list)
print(
[
res.text
for res in results
if self._interprete_response_to_succ_or_err(res) == "succ"
]
)
this class is called in file like this
if __name__ == "__main__":
"""
the source of ids is a csv file with single column in s3 that contains list
of columns with single id per line
"""
id_list = boto3_get_object_body(my_file_name).decode().split("\n") # custom function, works
batcher = Batcher()
batcher.start(id_list)
This script is a part of AWS Batch Job that is triggered via CLI. the same function runs perfectly on my local machine with same environment as in AWS Batch. It throws
_pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.ServiceResource'>: attribute lookup s3.ServiceResource on boto3.resources.factory failed
in the line where I try to iterate over IMapIterator object results that is generated by pool.imap()
Relevant Traceback:
for res in results
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.ServiceResource'>: attribute lookup s3.ServiceResource on boto3.resources.factory failed
I am wondering if I am missing something blatantly obvious or this issue is related to EC2 Instance spun on by batch job at this point and appreciate any kind of lead to root cause analysis.
This error happens because multiprocessing could not import the relevant datatype for duplicating data or calling the target function in the new process it started. This usually happens when the object necessary for the target function to run is created someplace the child process do not know about (for example, a class created inside the if __name__ ==... block in main module), or if the object's __qualname__ property has been fiddled with (you might see this using something similar to functools.wraps or monkey-patching in general)
Therefore, to actually "fix" this, you need to dig in your code and see if the above is true. A good place to start is with the class that is raising the issue (in this case it's boto3.resources.factory.s3.ServiceResource), can you import this in the main module before the if __name__... block runs?
However, most of the times, you can get away with by simply reducing the data required to start the target function (less data = less chances for faults occuring). In this case, the target function you are calling in the pool is an instance method. To start this function in a new process, multiprocessing would need to pickle all the instance attributes, which might have their own instance attributes, and so on. Not only does this add overhead, it could also be possible that the problem lies in a particular instance attribute. Therefore, just as a good practice, if your target function can run independently but is currently an instance method, change it a to staticmethod instead.
In this case, this would mean changing _fetch_dat_data to a staticmethod, and submitting it to the pool using type(self)._fetch_dat_data instead.

How to log the return value of a POST method after returning the response?

I'm working on my first ever REST API, so apologies in advance if I've missed something basic. I have a function that takes a JSON request from another server, processes it (makes a prediction based on the data), and returns another JSON with the results. I'd like to keep a log on the server's local disk of all requests to this endpoint along with their results, for evaluation purposes and for retraining the model. However, for the purposes of minimising the latency of returning the result to the user, I'd like to return the response data first, and then write it to the local disk. It's not obvious to me how to do this properly, as the FastAPI paradigm necessitates that the result of a POST method is the return value of the decorated function, so anything I want to do with the data has to be done before it is returned.
Below is a minimal working example of what I think is my closest attempt at getting it right so far, using a custom object with a log decorator - my idea was just to assign the result to the log object as a class attribute, then use another method to write it to disk, but I can't figure out how to make sure that that function gets called after get_data every time.
import json
import uvicorn
from fastapi import FastAPI, Request
from functools import wraps
from pydantic import BaseModel
class Blob(BaseModel):
id: int
x: float
def crunch_numbers(data: Blob) -> dict:
# does some stuff
return {'foo': 'bar'}
class PostResponseLogger:
def __init__(self) -> None:
self.post_result = None
def log(self, func, *args, **kwargs):
#wraps(func)
def func_to_log(*args, **kwargs):
post_result = func(*args, **kwargs)
self.post_result = post_result
# how can this be done outside of this function ???
self.write_data()
return post_result
return func_to_log
def write_data(self):
if self.post_result:
with open('output.json', 'w') as f:
json.dump(self.post_result, f)
def main():
app = FastAPI()
logger = PostResponseLogger()
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request):
result = crunch_numbers(input_json)
return result
uvicorn.run(app=app)
if __name__ == '__main__':
main()
Basically, my question boils down to: "is there a way, in the PostResponseLogger class, to automatically call self.write_data after every call to self.log?", but if I'm using the wrong approach altogether, any other suggestions are also welcome.
You could have a Background Task for that purpose. A background task "will run only once the response has been sent" (see Starlette documentation). "This is useful for operations that need to happen after a request, but that the client doesn't really have to be waiting for the operation to complete before receiving the response" (see FastAPI documentation).
You can define a task function to run in the background for writing the log data, as shown below:
def write_log_data():
logger.write_data()
Then, import BackgroundTasks and define a parameter in your endpoint with a type declaration of BackgroundTasks. Inside of your endpoint, pass your task function (i.e., write_log_data, as defined above) to the background_tasks object with the method .add_task():
from fastapi import BackgroundTasks
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request, background_tasks: BackgroundTasks):
result = crunch_numbers(input_json)
background_tasks.add_task(write_log_data)
return result
The same principle could be applied if a middleware was used to capture and log the response data, as described in this answer, or a custom APIRoute class, as demonstrated in this answer.
For future reference, if you (or anyone) ever need to use async/await syntax, and run into concurrency issues (such as the event loop getting blocked) while performing some heavy background computation, please have a look at this answer, which explains the difference between defining an endpoint or a background task function with async def and def (briefly, async def endpoints/background tasks will run in the event loop, whereas def functions will run in an external threadpool that is then awaited), as well as provides solutions when it comes to running blocking I/O-bound or CPU-bound operations in such functions.

How to create constent object for aws dynamodb in python?

I want to know how can check if the boto3 client object already instantiated or not. i have below code for creating the object for aws dynamodb
import boto3
def aws_dynamodb():
return boto3.resource("dynamodb")
def get_db_con():
dynamo_conn=aws_dynamodb()
return dynamo_conn
Now the above 'get_db_con()' return the connection to the dynamodb. but i want to make sure that 'get_db_con' not creating the client object from the 'aws_dynamodb()' everytime when its been called.
for eg:
def aws_db_table(table):
con=get_db_con()
return con.table(table)
account_table=aws_db_table("my_ac_table")
audit_table=aws_db_table("audit_table")
so here whenever i call the 'aws_db_table', it should not create the client for 'aws_dynamodb()' for everytime.
so how i can check it if the aws_dynamodb() is already instantiated or not creating new client object. Because creating client object everytime is costly..
Note: I want to run the code in Lambda function
Please help us on this..
Thanks
I usually go with a low-tech solution with a global variable that looks something like this:
import boto3
# Underscore-prefix to indicate this is something private
_TABLE_RESOURCE = None
def get_table_resource():
global _TABLE_RESOURCE
if _TABLE_RESOURCE is None:
_TABLE_RESOURCE = boto3.resource("dynamodb").Table("my_table")
return _TABLE_RESOURCE
def handler(event, context):
table = get_table_resource()
# ...
Global variables are persisted across Lambda executions, that's why this works.
Another option would be to use the lru_cache from functools, which uses memoization.
from functools import lru_cache
import boto3
#lru_cache(maxsize=128)
def get_table_resource():
return boto3.resource("dynamodb").Table("my_table")
def handler(event, context):
table = get_table_resource()
# ...
For those not familiar with memoization the first solution is probably easier to read + understand.
(Note: I wrote the code from memory, there may be bugs)

python 3: mock a method of the boto3 S3 client

I want to unit test some code that calls a method of the boto3 s3 client.
I can't use moto because this particular method (put_bucket_lifecycle_configuration) is not yet implemented in moto.
I want to mock the S3 client and assure that this method was called with specific parameters.
The code I want to test looks something like this:
# sut.py
import boto3
class S3Bucket(object):
def __init__(self, name, lifecycle_config):
self.name = name
self.lifecycle_config = lifecycle_config
def create(self):
client = boto3.client("s3")
client.create_bucket(Bucket=self.name)
rules = # some code that computes rules from self.lifecycle_config
# I want to test that `rules` is correct in the following call:
client.put_bucket_lifecycle_configuration(Bucket=self.name, \
LifecycleConfiguration={"Rules": rules})
def create_a_bucket(name):
lifecycle_policy = # a dict with a bunch of key/value pairs
bucket = S3Bucket(name, lifecycle_policy)
bucket.create()
return bucket
In my test, I'd like to call create_a_bucket() (though instantiating an S3Bucket directly is also an option) and make sure that the call to put_bucket_lifecycle_configuration was made with the correct parameters.
I have messed around with unittest.mock and botocore.stub.Stubber but have not managed to crack this. Unless otherwise urged, I am not posting my attempts since they have not been successful so far.
I am open to suggestions on restructuring the code I'm trying to test in order to make it easier to test.
Got the test to work with the following, where ... is the remainder of the arguments that are expected to be passed to s3.put_bucket_lifecycle_configuration().
# test.py
from unittest.mock import patch
import unittest
import sut
class MyTestCase(unittest.TestCase):
#patch("sut.boto3")
def test_lifecycle_config(self, cli):
s3 = cli.client.return_value
sut.create_a_bucket("foo")
 s3.put_bucket_lifecycle_configuration.assert_called_once_with(Bucket="foo", ...)
if __name__ == '__main__':
unittest.main()

Resources