I want to unit test some code that calls a method of the boto3 s3 client.
I can't use moto because this particular method (put_bucket_lifecycle_configuration) is not yet implemented in moto.
I want to mock the S3 client and assure that this method was called with specific parameters.
The code I want to test looks something like this:
# sut.py
import boto3
class S3Bucket(object):
def __init__(self, name, lifecycle_config):
self.name = name
self.lifecycle_config = lifecycle_config
def create(self):
client = boto3.client("s3")
client.create_bucket(Bucket=self.name)
rules = # some code that computes rules from self.lifecycle_config
# I want to test that `rules` is correct in the following call:
client.put_bucket_lifecycle_configuration(Bucket=self.name, \
LifecycleConfiguration={"Rules": rules})
def create_a_bucket(name):
lifecycle_policy = # a dict with a bunch of key/value pairs
bucket = S3Bucket(name, lifecycle_policy)
bucket.create()
return bucket
In my test, I'd like to call create_a_bucket() (though instantiating an S3Bucket directly is also an option) and make sure that the call to put_bucket_lifecycle_configuration was made with the correct parameters.
I have messed around with unittest.mock and botocore.stub.Stubber but have not managed to crack this. Unless otherwise urged, I am not posting my attempts since they have not been successful so far.
I am open to suggestions on restructuring the code I'm trying to test in order to make it easier to test.
Got the test to work with the following, where ... is the remainder of the arguments that are expected to be passed to s3.put_bucket_lifecycle_configuration().
# test.py
from unittest.mock import patch
import unittest
import sut
class MyTestCase(unittest.TestCase):
#patch("sut.boto3")
def test_lifecycle_config(self, cli):
s3 = cli.client.return_value
sut.create_a_bucket("foo")
s3.put_bucket_lifecycle_configuration.assert_called_once_with(Bucket="foo", ...)
if __name__ == '__main__':
unittest.main()
Related
How it started
I'm testing a class, ClassToTest, that makes API calls using atlassian-python-api. The tests are going to ensure that ClassToTest performs correctly with the data it gets back from the API. Many of the atlassian-python-api API calls use instantiated classes which inherit from the same base class or group of top-level classes.
I'd like to write tests that will expose breaks in the API contract if the wrong data is returned or API calls fail, while also testing the class I wrote to ensure it does the correct things with the data returned from the API. In order to do this, I was hoping to use unittest.mock.patch("path.to.Comment", autospec=True) to copy the API spec into the MagicMock, but I don't believe it's working properly.
For the purposes of the question, ClassToTest is not that important; what I am aiming to solve is how to setup and configure the pytest fixtures in a way that I can use them to mimic the API endpoints that will return the data that ClassToTest will act upon. Ideally I'd like to reuse the fixtures without having patch conflicts. I've included relevant code from ClassToTest for illustrative purposes here:
class_to_test.py:
from atlassian.bitbucket import Cloud
from typing import NamedTuple
# these are hardcoded constants that work with the production API
from src.constants import (
PULL_REQUEST_ID,
REPOSITORY,
WORKSPACE,
)
CommentType = NamedTuple("CommentType", [("top_level", str), ("inline", str)])
class ClassToTest:
def _get_token(self):
"""this returns a token of type(str)"""
def __init__(self, workspace, repository, pull_request_id):
self.active_comments = None
self.environment = sys.argv[1]
self.comment_text = CommentType(
top_level=r"top_level_comment text", inline=r"inline_comment text"
)
self.cloud = Cloud(token=self._get_token(), cloud=True)
self.workspace = self.cloud.workspaces.get(workspace)
self.repository = self.cloud.repositories.get(workspace, repository)
self.pull_request = self.repository.pullrequests.get(id=pull_request_id)
def _get_active_comments(self):
"""Returns a list of active (non-deleted) comments"""
return [
c for c in self.pull_request.comments() if c.data["deleted"] is False
]
# a few more methods here
def main():
instance = ClassToTest(WORKSPACE, REPOSITORY, PULL_REQUEST_ID)
# result = instance.method() for each method I need to call.
# do things with each result
if __name__ == "__main__":
main()
The class has methods that retrieve comments from the API (_get_active_comments, above), act on the retrieved comments, retrieve pull requests, and so on. What I am trying to test is that the class methods act correctly on the data received from the API, so I need to accurately mock data returned from API calls.
How it's going
I started with a unittest.Testcase style test class and wanted the flexibility of pytest fixtures (and autospec), but removed Testcase entirely when I discovered that pytest fixtures don't really work with it. I'm currently using a pytest class and conftest.py as follows:
/test/test_class_to_test.py:
import pytest
from unittest.mock import patch
from src.class_to_test import ClassToTest
#pytest.mark.usefixtures("mocked_comment", "mocked_user")
class TestClassToTest:
# We mock Cloud here as ClassToTest calls it in __init__ to authenticate with the API
# _get_token retrieves an access token for the API; since we don't need it, we can mock it
#patch("src.test_class_to_test.Cloud", autospec=True)
#patch.object(ClassToTest, "_get_token").
def setup_method(self, method, mock_get_token, mock_cloud):
mock_get_token.return_value = "token"
self.checker = ClassToTest("WORKSPACE", "REPOSITORY", 1)
def teardown_method(self, method):
pass
def test_has_top_level_and_inline_comments(self, mocked_comment, mocked_pull_request):
mock_top_comment = mocked_comment(raw="some text to search for later")
assert isinstance(mock_top_comment.data, dict)
assert mock_top_comment.data["raw"] == "some text to search for later"
# the assert below this line is failing
assert mock_top_comment.user.account_id == 1234
conftest.py:
import pytest
from unittest.mock import patch, PropertyMock
from atlassian.bitbucket.cloud.common.comments import Comment
from atlassian.bitbucket.cloud.common.users import User
#pytest.fixture()
def mocked_user(request):
def _mocked_user(account_id=1234):
user_patcher = patch(
f"atlassian.bitbucket.cloud.common.users.User", spec_set=True, autospec=True
)
MockUser = user_patcher.start()
data = {"type": "user", "account_id": account_id}
url = "user_url"
user = MockUser(data=data, url=url)
# setup mocked properties
mock_id = PropertyMock(return_value=account_id)
type(user).id = mock_id
mockdata = PropertyMock(return_value=data)
type(user).data = mockdata
request.addfinalizer(user_patcher.stop)
return user
return _mocked_user
#pytest.fixture()
def mocked_comment(request, mocked_user):
def _mocked_comment(raw="", inline=None, deleted=False, user_id=1234):
comment_patcher = patch(
f"atlassian.bitbucket.cloud.common.comments.Comment", spec_set=True, autospec=True
)
MockComment = comment_patcher.start()
data = {
"type": "pullrequest_comment",
"user": mocked_user(user_id),
"raw": raw,
"deleted": deleted,
}
if inline:
data["inline"] = {"from": None, "to": 1, "path": "src/code_issues.py"}
data["raw"] = "this is an inline comment"
comment = MockComment(data)
# setup mocked properties
mockdata = PropertyMock(return_value=data)
type(comment).data = mockdata
# mockuser = PropertyMock(return_value=mocked_user(user_id))
# type(comment).user = mockuser
request.addfinalizer(comment_patcher.stop)
return comment
return _mocked_comment
The problem I am encountering is that the assert mock_top_comment.user.account_id == 1234 line fails when running the test, with the following error:
> assert mock_top_comment.user.account_id == 1234
E AssertionError: assert <MagicMock name='Comment().user.account_id' id='4399290192'> == 1234
E + where <MagicMock name='Comment().user.account_id' id='4399290192'> = <MagicMock name='Comment().user' id='4399634736'>.account_id
E + where <MagicMock name='Comment().user' id='4399634736'> = <NonCallableMagicMock name='Comment()' spec_set='Comment' id='4399234928'>.user
How do I get the mock User class to attach to the mock Comment class in the same way that the real API makes it work? Is there something about autospec that I'm missing, or should I be abandoning unittest.mock.patch entirely and using something else?
Extra credit (EDIT: in retrospect, this may be the most important part)
I'm using mocked_comment as a pytest fixture factory and want to reuse it multiple times in the same test (for example to create multiple mocked Comments returned in a list). So far, each time I've tried to do that, I've been met with the following error:
def test_has_top_level_and_inline_comments(self, mocked_comment, mocked_pull_request):
mock_top_comment = mocked_comment(raw="Some comment text")
> mock_inline_comment = mocked_comment(inline=True)
...
test/conftest.py:30: in _mocked_comment
MockComment = comment_patcher.start()
/opt/homebrew/Cellar/python#3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/mock.py:1585: in start
result = self.__enter__()
...
> raise InvalidSpecError(
f'Cannot autospec attr {self.attribute!r} from target '
f'{target_name!r} as it has already been mocked out. '
f'[target={self.target!r}, attr={autospec!r}]')
E unittest.mock.InvalidSpecError: Cannot autospec attr 'Comment' from target 'atlassian.bitbucket.cloud.common.comments' as it has already been mocked out. [target=<module 'atlassian.bitbucket.cloud.common.comments' from '/opt/homebrew/lib/python3.10/site-packages/atlassian/bitbucket/cloud/common/comments.py'>, attr=<MagicMock name='Comment' spec_set='Comment' id='4398964912'>]
I thought the whole point of a pytest fixture factory was to be reusable, but I believe that using an autospec mock complicates things quite a bit. I don't want to have to hand copy every detail from the API spec into the tests, as that will have to be changed if anything in the API changes. Is there a solution for this that involves automatically and dynamically creating the necessary classes in the mocked API with the correct return values for properties?
One thing I'm considering is separating the testing into two parts: API contract, and ClassToTest testing. In this way I can write the tests for ClassToTest without relying on the API and they will pass as long as I manipulate the received data correctly. Any changes to the API will get caught by the separate contract testing tests. Then I can use non-factory fixtures with static data for testing ClassToTest.
For now though, I'm out of ideas on how to proceed with this. What should I do here? Probably the most important thing to address is how to properly link the User instance with the Comment instance in the fixtures so that my method calls in test work the same way as they do in production. Bonus points if we can figure out how to dynamically patch multiple fixtures in a single test.
I've started looking at this answer, but given the number of interconnected classes and properties, I'm not sure it will work without writing out a ton of fixtures. After following the directions and applying them to the User mock inside the Comment mock, I started getting the error in the Extra Credit section above, where autospec couldn't be used as it has already been mocked out.
Taken from this answer:
Python mock AWS SSM
I now have this code:
test_2.py
from unittest import TestCase
import boto3
import pytest
from moto import mock_ssm
#pytest.yield_fixture
def s3ssm():
with mock_ssm():
ssm = boto3.client("ssm")
yield ssm
#mock_ssm
class MyTest(TestCase):
def setUp(self):
ssm = boto3.client("ssm")
ssm.put_parameter(
Name="/mypath/password",
Description="A test parameter",
Value="this is it!",
Type="SecureString",
)
def test_param_getting(self):
import real_code
resp = real_code.get_variable("/mypath/password")
assert resp["Parameter"]["Value"] == "this is it!"
and this is my code to test (or a cut down example):
real_code.py
import boto3
class ParamTest:
def __init__(self) -> None:
self.client = boto3.client("ssm")
pass
def get_parameters(self, param_name):
print(self.client.describe_parameters())
return self.client.get_parameters_by_path(Path=param_name)
def get_variable(param_name):
p = ParamTest()
param_details = p.get_parameters(param_name)
return param_details
I have tried a number of solutions, and switched between pytest and unittest quite a few times!
Each time I run the code, it doesn't reach out to AWS so it seems something is affecting the boto3 client, but it doesn't return the parameter. If I edit real_code.py to not have a class inside it the test passes.
Is it not possible to patch the client inside the class in the real_code.py file? I'm trying to do this without editing the real_code.py file at all if possible.
Thanks,
The get_parameters_by_path returns all parameters that are prefixed with the supplied path.
When providing /mypath, it would return /mypath/password.
But when providing /mypath/password, as in your example, it will only return parameters that look like this: /mypath/password/..
If you are only looking to retrieve a single parameter, the get_parameter call would be more suitable:
class ParamTest:
def __init__(self) -> None:
self.client = boto3.client("ssm")
pass
def get_parameters(self, param_name):
# Decrypt the value, as it is stored as a SecureString
return self.client.get_parameter(Name=param_name, WithDecryption=True)
Edit: Note that Moto behaves the same as AWS in this.
From https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ssm.html#SSM.Client.get_parameters_by_path:
[The Path-parameter is t]he hierarchy for the parameter. [...] The hierachy is the parameter name except the last part of the parameter. For the API call to succeeed, the last part of the parameter name can't be in the path.
I want to know how can check if the boto3 client object already instantiated or not. i have below code for creating the object for aws dynamodb
import boto3
def aws_dynamodb():
return boto3.resource("dynamodb")
def get_db_con():
dynamo_conn=aws_dynamodb()
return dynamo_conn
Now the above 'get_db_con()' return the connection to the dynamodb. but i want to make sure that 'get_db_con' not creating the client object from the 'aws_dynamodb()' everytime when its been called.
for eg:
def aws_db_table(table):
con=get_db_con()
return con.table(table)
account_table=aws_db_table("my_ac_table")
audit_table=aws_db_table("audit_table")
so here whenever i call the 'aws_db_table', it should not create the client for 'aws_dynamodb()' for everytime.
so how i can check it if the aws_dynamodb() is already instantiated or not creating new client object. Because creating client object everytime is costly..
Note: I want to run the code in Lambda function
Please help us on this..
Thanks
I usually go with a low-tech solution with a global variable that looks something like this:
import boto3
# Underscore-prefix to indicate this is something private
_TABLE_RESOURCE = None
def get_table_resource():
global _TABLE_RESOURCE
if _TABLE_RESOURCE is None:
_TABLE_RESOURCE = boto3.resource("dynamodb").Table("my_table")
return _TABLE_RESOURCE
def handler(event, context):
table = get_table_resource()
# ...
Global variables are persisted across Lambda executions, that's why this works.
Another option would be to use the lru_cache from functools, which uses memoization.
from functools import lru_cache
import boto3
#lru_cache(maxsize=128)
def get_table_resource():
return boto3.resource("dynamodb").Table("my_table")
def handler(event, context):
table = get_table_resource()
# ...
For those not familiar with memoization the first solution is probably easier to read + understand.
(Note: I wrote the code from memory, there may be bugs)
I have a very large python file that consists of multiple defined functions. If you're familiar with AWS Lambda, when you create a lambda function, you specify a handler, which is a function in the code that AWS Lambda can invoke when service executes my code, which is represented below in my_handler.py file:
def handler_name(event, context):
...
return some_value
Link Source: https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
However, as I mentioned above, I have multiple defined functions in my_handler.py that have their own events and contexts. Therefore, this will result in an error. Are there any ways around this in python3.6?
Your single handler function will need to be responsible for parsing the incoming event, and determining the appropriate route to take. For example, let's say your other functions are called helper1 and helper2. Your Lambda handler function will inspect the incoming event and then, based on one of the fields in the incoming event (ie. let's call it EventType), call either helper1 or helper2, passing in both the event and context objects.
def handler_name(event, context):
if event['EventType'] == 'helper1':
helper1(event, context)
elif event['EventType'] == 'helper2':
helper2(event, context)
def helper1(event, context):
pass
def helper2(event, context):
pass
This is only pseudo-code, and I haven't tested it myself, but it should get the concept across.
Little late to the game but thought it wouldn't hurt to share. Best practices suggest that one separate the handler from the Lambda's core logic. Not only is it okay to add additional definitions, it can lead to more legible code and reduce waste--e.g. multiple API calls to S3. So, although it can get out of hand, I disagree with some of those critiques to your initial question. It's effective to use your handler as a logical interface to the additional functions that will accomplish your various work. In Data Architecture & Engineering land it's often less-costly and more efficient to work in this manner. Particularly if you are building out ETL pipelines, following service-oriented architectural patterns. Admittedly, I'm a bit of a Maverick and some may find this unruly/egregious but I've gone so far as to build classes into my Lambdas for various reasons--e.g. centralized, data-lake-ish S3 buckets that accommodate a variety of file types, reduce unnecessary requests, etc...--and I stand by it. Here's an example of one of my handler files from a CDK example project I put on the hub awhile back. Hopefully it'll give you some useful ideas, or at the very least not feel alone in wanting to beef up your Lambdas.
import requests
import json
from requests.exceptions import Timeout
from requests.exceptions import HTTPError
from botocore.exceptions import ClientError
from datetime import date
import csv
import os
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
class Asteroids:
"""Client to NASA API and execution interface to branch data processing by file type.
Notes:
This class doesn't look like a normal class. It is a simple example of how one might
workaround AWS Lambda's limitations of class use in handlers. It also allows for
better organization of code to simplify this example. If one planned to add
other NASA endpoints or process larger amounts of Asteroid data for both .csv and .json formats,
asteroids_json and asteroids_csv should be modularized and divided into separate lambdas
where stepfunction orchestration is implemented for a more comprehensive workflow.
However, for the sake of this demo I'm keeping it lean and easy.
"""
def execute(self, format):
"""Serves as Interface to assign class attributes and execute class methods
Raises:
Exception: If file format is not of .json or .csv file types.
Notes:
Have fun!
"""
self.file_format=format
self.today=date.today().strftime('%Y-%m-%d')
# method call below used when Secrets Manager integrated. See get_secret.__doc__ for more.
# self.api_key=get_secret('nasa_api_key')
self.api_key=os.environ["NASA_KEY"]
self.endpoint=f"https://api.nasa.gov/neo/rest/v1/feed?start_date={self.today}&end_date={self.today}&api_key={self.api_key}"
self.response_object=self.nasa_client(self.endpoint)
self.processed_response=self.process_asteroids(self.response_object)
if self.file_format == "json":
self.asteroids_json(self.processed_response)
elif self.file_format == "csv":
self.asteroids_csv(self.processed_response)
else:
raise Exception("FILE FORMAT NOT RECOGNIZED")
self.write_to_s3()
def nasa_client(self, endpoint):
"""Client component for API call to NASA endpoint.
Args:
endpoint (str): Parameterized url for API call.
Raises:
Timeout: If connection not made in 5s and/or data not retrieved in 15s.
HTTPError & Exception: Self-explanatory
Notes:
See Cloudwatch logs for debugging.
"""
try:
response = requests.get(endpoint, timeout=(5, 15))
except Timeout as timeout:
print(f"NASA GET request timed out: {timeout}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f'Other error occurred: {err}')
else:
return json.loads(response.content)
def process_asteroids(self, payload):
"""Process old, and create new, data object with content from response.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
near_earth_objects = payload["near_earth_objects"][f"{self.today}"]
asteroids = []
for neo in near_earth_objects:
asteroid_object = {
"id" : neo['id'],
"name" : neo['name'],
"hazard_potential" : neo['is_potentially_hazardous_asteroid'],
"est_diameter_min_ft": neo['estimated_diameter']['feet']['estimated_diameter_min'],
"est_diameter_max_ft": neo['estimated_diameter']['feet']['estimated_diameter_max'],
"miss_distance_miles": [item['miss_distance']['miles'] for item in neo['close_approach_data']],
"close_approach_exact_time": [item['close_approach_date_full'] for item in neo['close_approach_data']]
}
asteroids.append(asteroid_object)
return asteroids
def asteroids_json(self, payload):
"""Creates json object from payload content then writes to .json file.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
json_file = open(f"/tmp/asteroids_{self.today}.json",'w')
json_file.write(json.dumps(payload, indent=4))
json_file.close()
def asteroids_csv(self, payload):
"""Creates .csv object from payload content then writes to .csv file.
"""
csv_file=open(f"/tmp/asteroids_{self.today}.csv",'w', newline='\n')
fields=list(payload[0].keys())
writer=csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerows(payload)
csv_file.close()
def get_secret(self):
"""Gets secret from AWS Secrets Manager
Notes:
Have yet to integrate into the CDK. Leaving as example code.
"""
secret_name = os.environ['TOKEN_SECRET_NAME']
region_name = os.environ['REGION']
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
raise e
else:
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
secret = b64decode(get_secret_value_response['SecretBinary'])
return secret
def write_to_s3(self):
"""Uploads both .json and .csv files to s3
"""
s3 = boto3.client('s3')
s3.upload_file(f"/tmp/asteroids_{self.today}.{self.file_format}", os.environ['S3_BUCKET'], f"asteroid_data/asteroids_{self.today}.{self.file_format}")
def handler(event, context):
"""Instantiates class and triggers execution method.
Args:
event (dict): Lists a custom dict that determines interface control flow--i.e. `csv` or `json`.
context (obj): Provides methods and properties that contain invocation, function and
execution environment information.
*Not used herein.
"""
asteroids = Asteroids()
asteroids.execute(event)
Lets say I the following class;
class CompositionClass(object):
def __init__(self):
self._redis = Redis()
self._binance_client = BinanceClient()
def do_processing(self, data):
self._redis.write(data)
self._binance_client.buy(data.amount_to_buy)
# logic to actually unittest
return process_val
I have other objects which call external API as composition in my ComplexClass. When I am unittesting the logic of do_processing, I do not want to call these expensive API calls. I have checked throughly in SO and Google about unittesting; all examples are simple not that really useful. In my case how can I use unittest.mock to mock these objects?
One way of mocking the Redis and BinanceClient classes is to use the patch decorator in your test class, such as:
from unittest import TestCase
from unittest.mock import patch
from package.module import CompositionClass
class TestCompositionClass(TestCase):
#patch('package.module.BinanceClient')
#patch('package.module.Redis')
def test_do_processing(self, mock_redis, mock_binance):
c = CompositionClass()
data = [...]
c.do_processing(data)
# Perform your assertions
# Check that mocks were called
mock_redis.return_value.write.assert_called_once_with(data)
mock_binance.return_value.buy.assert_called_once_with(data.amount_to_buy)
Note that the path specified to #patch is the path to module containing the CompositionClass and its imports for Redis and BinanceClient. The patching happens in that module, not the module containing the Redis and BinanceClient implementations themselves.
You need to set a value that must be returned by your API call, to this function:
from unittest.mock import MagicMock
class Tester(unittest.TestCase):
def setUp(self):
pass
def test_do_processing(self):
self.API_function = MagicMock(return_value='API_response')
# test logic