Unit Testing a Method That Uses a Context Manager - python-3.x

I have a method I would like to unit test. The method expects a file path, which is then opened - using a context manager - to parse a value which is then returned, should it be present, simple enough.
#staticmethod
def read_in_target_language(file_path):
"""
.. note:: Language code attributes/values can occur
on either the first or the second line of bilingual.
"""
with codecs.open(file_path, 'r', encoding='utf-8') as source:
line_1, line_2 = next(source), next(source)
get_line_1 = re.search(
'(target-language=")(.+?)(")', line_1, re.IGNORECASE)
get_line_2 = re.search(
'(target-language=")(.+?)(")', line_2, re.IGNORECASE)
if get_line_1 is not None:
return get_line_1.group(2)
else:
return get_line_2.group(2)
I want to avoid testing against external files - for obvious reasons - and do not wish to create temp files. In addition, I cannot use StringIO in this case.
How can I mock the file_path object in my unit test case? Ultimately I would need to create a mock path that contains differing values. Any help is gratefully received.

(Disclaimer: I don't speak Python, so I'm likely to err in details)
I suggest that you instead mock codecs. Make the mock's open method return an object with test data to be returned from the read calls. That might involve creating another mock object for the return value; I don't know if there are some stock classes in Python that you could use for that purpose instead.
Then, in order to actually enable testing the logic, add a parameter to read_in_target_language that represents an object that can assume the role of the original codecs object, i.e. dependency injection by argument. For convenience I guess you could default it to codecs.
I'm not sure how far Python's duck typing goes with regards to static vs instance methods, but something like this should give you the general idea:
def read_in_target_language(file_path, opener=codecs):
...
with opener.open(file_path, 'r', encoding='utf-8') as source:
If the above isn't possible you could just add a layer of indirection:
class CodecsOpener:
...
def open(self, file_path, access, encoding):
return codecs.open(file_path, access, encoding)
class MockOpener:
...
def __init__(self, open_result):
self.open_result = open_result
def open(self, file_path, access, encoding):
return self.open_result
...
def read_in_target_language(file_path, opener=CodecsOpener()):
...
with opener.open(file_path, 'r', encoding='utf-8') as source:
...
...
def test():
readable_data = ...
opener = MockOpener(readable_data)
result = <class>.read_in_target_language('whatever', opener)
<check result>

Related

python: multiple functions or abstract classes when dealing with data flow requirement

I have more of a design question, but I am not sure how to handle that. I have a script preprocessing.py where I read a .csv file of text column that I would like to preprocess by removing punctuations, characters, ...etc.
What I have done now is that I have written a class with several functions as follows:
class Preprocessing(object):
def __init__(self, file):
self.my_data = pd.read_csv(file)
def remove_punctuation(self):
self.my_data['text'] = self.my_data['text'].str.replace('#','')
def remove_hyphen(self):
self.my_data['text'] = self.my_data['text'].str.replace('-','')
def remove_words(self):
self.my_data['text'] = self.my_data['text'].str.replace('reference','')
def save_data(self):
self.my_data.to_csv('my_data.csv')
def preprocessing(file_my):
f = Preprocessing(file_my)
f.remove_punctuation()
f.remove_hyphen()
f.remove_words()
f.save_data()
return f
if __name__ == '__main__':
preprocessing('/path/to/file.csv')
although it works fine, i would like to be able to expand the code easily and have smaller classes instead of having one large class. So i decided to use abstract class:
import pandas as pd
from abc import ABC, abstractmethod
my_data = pd.read_csv('/Users/kgz/Desktop/german_web_scraping/file.csv')
class Preprocessing(ABC):
#abstractmethod
def processor(self):
pass
class RemovePunctuation(Preprocessing):
def processor(self):
return my_data['text'].str.replace('#', '')
class RemoveHyphen(Preprocessing):
def processor(self):
return my_data['text'].str.replace('-', '')
class Removewords(Preprocessing):
def processor(self):
return my_data['text'].str.replace('reference', '')
final_result = [cls().processor() for cls in Preprocessing.__subclasses__()]
print(final_result)
So now each class is responsible for one task but there are a few issues I do not know how to handle since I am new to abstract classes. first, I am reading the file outside the classes, and I am not sure if that is good practice? if not, should i pass it as an argument to the processor function or have another class who is responsible to read the data.
Second, having one class with several functions allowed for a flow, so every transformation happened in order (i.e, first punctuation is removes, then hyphen is removed,...etc) but I do not know how to handle this order and dependency in abstract classes.

Using LogRecordFactory in python to add custom fields for logging

I am trying to add a custom field in my logging using LogRecordFactory. I am repeatedly calling a class and every time I do that, I want to set the custom_attribute in the init module so the remainder of the code within the class will have this attribute. But I cannot get this to work. I found the following which works, but its static.
import logging
old_factory = logging.getLogRecordFactory()
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.custom_attribute = "whatever"
return record
logging.basicConfig(format="%(custom_attribute)s - %(message)s")
logging.setLogRecordFactory(record_factory)
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("test")
This will output correctly:
whatever - test
However, my use case is that the custom_attribute will vary. Every time I call a specific function, I want to change this. So it seems like record_factory needs another parameter passed to it so it can then return the correct record with the new parameter. But I cant figure it out. I have tried adding a parameter to the function, but when I make the call I get:
TypeError: __init__() missing 7 required positional arguments: 'name', 'level', 'pathname', 'lineno', 'msg', 'args', and 'exc_info'
I think this has something to do with the *args and **kwargs but I don't really know. Also, why are there no parenthesis after record_factory when its called by logging.setLogRecordFactory? I have never seen a function work like this.
You can try to use closure:
import logging
old_factory = logging.getLogRecordFactory()
def record_factory_factory(context_id):
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.custom_attribute = context_id
return record
return record_factory
logging.basicConfig(format="%(custom_attribute)s - %(message)s")
logging.setLogRecordFactory(record_factory_factory("whatever"))
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("test")
logging.setLogRecordFactory(record_factory_factory("whatever2"))
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("test")
result:
$ python3 log_test.py
whatever - test
whatever2 - test
I stumbled upon this question while I was trying to do something similar. This is how I solved it, assuming that you want to add something called xyz to every log line (further explanation below):
import logging
import threading
thread_local = threading.local()
def add_xyz_to_logrecords(xyz):
factory = logging.getLogRecordFactory()
if isinstance(factory, XYZLogFactory):
factory.set_xyz(xyz)
else:
logging.setLogRecordFactory(XYZLogFactory(factory, xyz))
class XYZLogFactory():
def __init__(self, original_factory, xyz):
self.original_factory = original_factory
thread_local.xyz = xyz
def __call__(self, *args, **kwargs):
record = self.original_factory(*args, **kwargs)
try:
record.xyz = thread_local.xyz
except AttributeError:
pass
return record
def set_xyz(self, xyz):
thread_local.xyz = xyz
Here I've created a callable class XYZLogFactory, that remembers what the current value of xyz is, and also remembers what the original LogRecordFactory was. When called as a function, it creates a record using the original LogRecordFactory, and adds an xyz attribute with the current value.
The thread_local is to make it thread-safe, but for an easier version, you could just use an attribute on the XYZLogFactory:
class XYZLogFactory():
def __init__(self, original_factory, xyz):
self.original_factory = original_factory
self.xyz = xyz
def __call__(self, *args, **kwargs):
record = self.original_factory(*args, **kwargs)
record.xyz = self.xyz
return record
def set_xyz(self, xyz):
self.xyz = xyz
In my very first attempt (not shown here), I did not store the original factory, but stored it implicitly in the new LogRecordFactury using a closure. However, after a while that led to a RecursionError, because it kept calling the previous factory, which called the previous factory, etc.
Regarding your last question: there are no parentheses because the function is not called here. Instead it's passed to the logging.setLogRecordFactory, which saves it in a variable somewhere, and then calls that someplace else. If you want more information you can google something like 'functions as first class citizens'.
Easy example:
x = str # Assign to x the function that gives string representation of object
x(1) # outputs the string representation of 1, same as if you'd called str(1)
> '1'

How to Call Multiple Methods in Python Class Without Calling Each on Individually?

I have a class that contains a number of methods:
class PersonalDetails(ManagedObjectABC):
def __init__(self, personal_details):
self.personal_details = personal_details
def set_gender(self):
self.gender='Male:
def set_age(self):
self.set_age=22
etc.
I have many such methods, all that begin with the word `set. I want to create a new method within this class that will execute all methods that begin with set, like this:
def execute_all_settings(self):
'''
wrapper for setting all variables that start with set.
Will skip anything not matching regex '^set'
'''
to_execute=[f'''self.{i}()''' for i in dir(self) if re.search('^set',i)
print(to_execute)
[exec(i) for i in to_execute]
However, this reports an error:
NameError: name 'self' is not defined
How can I go about doing this?
more info
The reason I want to do it this way, rather than simply call each method individually, is that new methods may be added in the future, so I want to execute all methods (that start with "set" no matter what they are)
Do not use either exec or eval. Instead use getattr.
Also note that set_age is both a method and an attribute, try to avoid that.
import re
class PersonalDetails:
def __init__(self, personal_details):
self.personal_details = personal_details
def set_gender(self):
self.gender = 'Male'
def set_age(self):
self.age = 22
def execute_all_settings(self):
'''
wrapper for setting all variables that start with set.
Will skip anything not matching regex '^set'
'''
to_execute = [i for i in dir(self) if re.search('^set', i)]
print(to_execute)
for func_name in to_execute:
getattr(self, func_name)()
pd = PersonalDetails('')
pd.execute_all_settings()
print(pd.gender)
# ['set_age', 'set_gender']
# Male
This solution will work as long as all the "set" methods either do not expect any arguments (which is the current use-case), or they all expect the same arguments.

How to have more than one handler in AWS Lambda Function?

I have a very large python file that consists of multiple defined functions. If you're familiar with AWS Lambda, when you create a lambda function, you specify a handler, which is a function in the code that AWS Lambda can invoke when service executes my code, which is represented below in my_handler.py file:
def handler_name(event, context):
...
return some_value
Link Source: https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
However, as I mentioned above, I have multiple defined functions in my_handler.py that have their own events and contexts. Therefore, this will result in an error. Are there any ways around this in python3.6?
Your single handler function will need to be responsible for parsing the incoming event, and determining the appropriate route to take. For example, let's say your other functions are called helper1 and helper2. Your Lambda handler function will inspect the incoming event and then, based on one of the fields in the incoming event (ie. let's call it EventType), call either helper1 or helper2, passing in both the event and context objects.
def handler_name(event, context):
if event['EventType'] == 'helper1':
helper1(event, context)
elif event['EventType'] == 'helper2':
helper2(event, context)
def helper1(event, context):
pass
def helper2(event, context):
pass
This is only pseudo-code, and I haven't tested it myself, but it should get the concept across.
Little late to the game but thought it wouldn't hurt to share. Best practices suggest that one separate the handler from the Lambda's core logic. Not only is it okay to add additional definitions, it can lead to more legible code and reduce waste--e.g. multiple API calls to S3. So, although it can get out of hand, I disagree with some of those critiques to your initial question. It's effective to use your handler as a logical interface to the additional functions that will accomplish your various work. In Data Architecture & Engineering land it's often less-costly and more efficient to work in this manner. Particularly if you are building out ETL pipelines, following service-oriented architectural patterns. Admittedly, I'm a bit of a Maverick and some may find this unruly/egregious but I've gone so far as to build classes into my Lambdas for various reasons--e.g. centralized, data-lake-ish S3 buckets that accommodate a variety of file types, reduce unnecessary requests, etc...--and I stand by it. Here's an example of one of my handler files from a CDK example project I put on the hub awhile back. Hopefully it'll give you some useful ideas, or at the very least not feel alone in wanting to beef up your Lambdas.
import requests
import json
from requests.exceptions import Timeout
from requests.exceptions import HTTPError
from botocore.exceptions import ClientError
from datetime import date
import csv
import os
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
class Asteroids:
"""Client to NASA API and execution interface to branch data processing by file type.
Notes:
This class doesn't look like a normal class. It is a simple example of how one might
workaround AWS Lambda's limitations of class use in handlers. It also allows for
better organization of code to simplify this example. If one planned to add
other NASA endpoints or process larger amounts of Asteroid data for both .csv and .json formats,
asteroids_json and asteroids_csv should be modularized and divided into separate lambdas
where stepfunction orchestration is implemented for a more comprehensive workflow.
However, for the sake of this demo I'm keeping it lean and easy.
"""
def execute(self, format):
"""Serves as Interface to assign class attributes and execute class methods
Raises:
Exception: If file format is not of .json or .csv file types.
Notes:
Have fun!
"""
self.file_format=format
self.today=date.today().strftime('%Y-%m-%d')
# method call below used when Secrets Manager integrated. See get_secret.__doc__ for more.
# self.api_key=get_secret('nasa_api_key')
self.api_key=os.environ["NASA_KEY"]
self.endpoint=f"https://api.nasa.gov/neo/rest/v1/feed?start_date={self.today}&end_date={self.today}&api_key={self.api_key}"
self.response_object=self.nasa_client(self.endpoint)
self.processed_response=self.process_asteroids(self.response_object)
if self.file_format == "json":
self.asteroids_json(self.processed_response)
elif self.file_format == "csv":
self.asteroids_csv(self.processed_response)
else:
raise Exception("FILE FORMAT NOT RECOGNIZED")
self.write_to_s3()
def nasa_client(self, endpoint):
"""Client component for API call to NASA endpoint.
Args:
endpoint (str): Parameterized url for API call.
Raises:
Timeout: If connection not made in 5s and/or data not retrieved in 15s.
HTTPError & Exception: Self-explanatory
Notes:
See Cloudwatch logs for debugging.
"""
try:
response = requests.get(endpoint, timeout=(5, 15))
except Timeout as timeout:
print(f"NASA GET request timed out: {timeout}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f'Other error occurred: {err}')
else:
return json.loads(response.content)
def process_asteroids(self, payload):
"""Process old, and create new, data object with content from response.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
near_earth_objects = payload["near_earth_objects"][f"{self.today}"]
asteroids = []
for neo in near_earth_objects:
asteroid_object = {
"id" : neo['id'],
"name" : neo['name'],
"hazard_potential" : neo['is_potentially_hazardous_asteroid'],
"est_diameter_min_ft": neo['estimated_diameter']['feet']['estimated_diameter_min'],
"est_diameter_max_ft": neo['estimated_diameter']['feet']['estimated_diameter_max'],
"miss_distance_miles": [item['miss_distance']['miles'] for item in neo['close_approach_data']],
"close_approach_exact_time": [item['close_approach_date_full'] for item in neo['close_approach_data']]
}
asteroids.append(asteroid_object)
return asteroids
def asteroids_json(self, payload):
"""Creates json object from payload content then writes to .json file.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
json_file = open(f"/tmp/asteroids_{self.today}.json",'w')
json_file.write(json.dumps(payload, indent=4))
json_file.close()
def asteroids_csv(self, payload):
"""Creates .csv object from payload content then writes to .csv file.
"""
csv_file=open(f"/tmp/asteroids_{self.today}.csv",'w', newline='\n')
fields=list(payload[0].keys())
writer=csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerows(payload)
csv_file.close()
def get_secret(self):
"""Gets secret from AWS Secrets Manager
Notes:
Have yet to integrate into the CDK. Leaving as example code.
"""
secret_name = os.environ['TOKEN_SECRET_NAME']
region_name = os.environ['REGION']
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
raise e
else:
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
secret = b64decode(get_secret_value_response['SecretBinary'])
return secret
def write_to_s3(self):
"""Uploads both .json and .csv files to s3
"""
s3 = boto3.client('s3')
s3.upload_file(f"/tmp/asteroids_{self.today}.{self.file_format}", os.environ['S3_BUCKET'], f"asteroid_data/asteroids_{self.today}.{self.file_format}")
def handler(event, context):
"""Instantiates class and triggers execution method.
Args:
event (dict): Lists a custom dict that determines interface control flow--i.e. `csv` or `json`.
context (obj): Provides methods and properties that contain invocation, function and
execution environment information.
*Not used herein.
"""
asteroids = Asteroids()
asteroids.execute(event)

Python, mocking and wrapping methods without instantating objects

I want to mock a method of a class and use wraps, so that it is actually called, but I can inspect the arguments passed to it. I have seen at several places (here for example) that the usual way to do that is as follows (adapted to show my point):
from unittest import TestCase
from unittest.mock import patch
class Potato(object):
def foo(self, n):
return self.bar(n)
def bar(self, n):
return n + 2
class PotatoTest(TestCase):
spud = Potato()
#patch.object(Potato, 'foo', wraps=spud.foo)
def test_something(self, mock):
forty_two = self.spud.foo(n=40)
mock.assert_called_once_with(n=40)
self.assertEqual(forty_two, 42)
However, this instantiates the class Potato, in order to bind the mock to the instance method spud.foo.
What I need is to mock the method foo in all instances of Potato, and wrap them around the original methods. I.e, I need the following:
from unittest import TestCase
from unittest.mock import patch
class Potato(object):
def foo(self, n):
return self.bar(n)
def bar(self, n):
return n + 2
class PotatoTest(TestCase):
#patch.object(Potato, 'foo', wraps=Potato.foo)
def test_something(self, mock):
self.spud = Potato()
forty_two = self.spud.foo(n=40)
mock.assert_called_once_with(n=40)
self.assertEqual(forty_two, 42)
This of course doesn't work. I get the error:
TypeError: foo() missing 1 required positional argument: 'self'
It works however if wraps is not used, so the problem is not in the mock itself, but in the way it calls the wrapped function. For example, this works (but of course I had to "fake" the returned value, because now Potato.foo is never actually run):
from unittest import TestCase
from unittest.mock import patch
class Potato(object):
def foo(self, n):
return self.bar(n)
def bar(self, n):
return n + 2
class PotatoTest(TestCase):
#patch.object(Potato, 'foo', return_value=42)#, wraps=Potato.foo)
def test_something(self, mock):
self.spud = Potato()
forty_two = self.spud.foo(n=40)
mock.assert_called_once_with(n=40)
self.assertEqual(forty_two, 42)
This works, but it does not run the original function, which I need to run because the return value is used elsewhere (and I cannot fake it from the test).
Can it be done?
Note The actual reason behind my needs is that I'm testing a rest api with webtest. From the tests I perform some wsgi requests to some paths, and my framework instantiates some classes and uses their methods to fulfill the request. I want to capture the parameters sent to those methods to do some asserts about them in my tests.
In short, you can't do this using Mock instances alone.
patch.object creates Mock's for the specified instance (Potato), i.e. it replaces Potato.foo with a single Mock the moment it is called. Therefore, there is no way to pass instances to the Mock as the mock is created before any instances are. To my knowledge getting instance information to the Mock at runtime is also very difficult.
To illustrate:
from unittest.mock import MagicMock
class MyMock(MagicMock):
def __init__(self, *a, **kw):
super(MyMock, self).__init__(*a, **kw)
print('Created Mock instance a={}, kw={}'.format(a,kw))
with patch.object(Potato, 'foo', new_callable=MyMock, wrap=Potato.foo):
print('no instances created')
spud = Potato()
print('instance created')
The output is:
Created Mock instance a=(), kw={'name': 'foo', 'wrap': <function Potato.foo at 0x7f5d9bfddea0>}
no instances created
instance created
I would suggest monkey-patching your class in order to add the Mock to the correct location.
from unittest.mock import MagicMock
class PotatoTest(TestCase):
def test_something(self):
old_foo = Potato.foo
try:
mock = MagicMock(wraps=Potato.foo, return_value=42)
Potato.foo = lambda *a,**kw: mock(*a, **kw)
self.spud = Potato()
forty_two = self.spud.foo(n=40)
mock.assert_called_once_with(self.spud, n=40) # Now needs self instance
self.assertEqual(forty_two, 42)
finally:
Potato.foo = old_foo
Note that you using called_with is problematic as you are calling your functions with an instance.
Do you control creation of Potato instances, or at least have access to these instances after creating them? You should, else you'd not be able to check particular arg lists.
If so, you can wrap methods of individual instances using
spud = dig_out_a_potato()
with mock.patch.object(spud, "foo", wraps=spud.foo) as mock_spud:
# do your thing.
mock_spud.assert_called...
Your question looks identical to python mock - patching a method without obstructing implementation to me. https://stackoverflow.com/a/72446739/9230828 implements what you want (except that it uses a with statement instead of a decorator). wrap_object.py:
# Copyright (C) 2022, Benjamin Drung <bdrung#posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
import contextlib
import typing
import unittest.mock
#contextlib.contextmanager
def wrap_object(
target: object, attribute: str
) -> typing.Generator[unittest.mock.MagicMock, None, None]:
"""Wrap the named member on an object with a mock object.
wrap_object() can be used as a context manager. Inside the
body of the with statement, the attribute of the target is
wrapped with a :class:`unittest.mock.MagicMock` object. When
the with statement exits the patch is undone.
The instance argument 'self' of the wrapped attribute is
intentionally not logged in the MagicMock call. Therefore
wrap_object() can be used to check all calls to the object,
but not differentiate between different instances.
"""
mock = unittest.mock.MagicMock()
real_attribute = getattr(target, attribute)
def mocked_attribute(self, *args, **kwargs):
mock.__call__(*args, **kwargs)
return real_attribute(self, *args, **kwargs)
with unittest.mock.patch.object(target, attribute, mocked_attribute):
yield mock
Then you can write following unit test:
from unittest import TestCase
from wrap_object import wrap_object
class Potato:
def foo(self, n):
return self.bar(n)
def bar(self, n):
return n + 2
class PotatoTest(TestCase):
def test_something(self):
with wrap_object(Potato, 'foo') as mock:
self.spud = Potato()
forty_two = self.spud.foo(n=40)
mock.assert_called_once_with(n=40)
self.assertEqual(forty_two, 42)

Resources